![]() # as the array index duplicates are avoided and the values can be easily # storing them in the index of the array ipUniqueMatches. # Loop through the current line matching IP address like sequences and Gsub(beginsWithFwdSlashNotIP, "/", line) Gsub(digitSequenceTooLongNotIP, "x", line) ![]() # could result in inadvertently leaving such a sequence in place. # that multiple number dot slash sequences all get removed, as using "x" Use "/" as the replacement char for the 2 "FwdSlash" regexes so # creating a valid IP address from digits on either side of the removed Use a replacement char and not the empty string to avoid accidentally # Replace sequences on line which will interfere with extracting genuine # Set line to the current line (more efficient than using $0 below). # Regexes to match IP address like sequences next to forward slashes, toīeginsWithFwdSlashNotIP = "" ipLikeSequence ĮndsWithFwdSlashNotIP = ipLikeSequence "" VersioningNotIP = "(()?)?*" ipLikeSequence # Equivalent to "(version|ver|v)*" if "tolower($0)" was used. # Regex to match an IP address like sequence which is a version number. # Regex to match a number sequence longer than 3 digits.ĭigitSequenceTooLongNotIP = "+" # This is deliberately a loose match, the END section will check for IP # Regex to match an IP address like sequence (even if too long to be an IP). The Awk code is thankfully well commented as it uses some slightly obscure aspects of Awk that the casual Awk user would probably not be familiar with. I appreciate that this is overkill for the original poster and that it is not tailored for his needs but someone doing a search may come across this answer and find the fairly comprehensive nature of the code useful. It avoids version numbers whether in the text or as part of an url and makes sure the IP numbers are in range. The following is some Awk code I wrote a while ago for use in a Bash script to extract valid unique (no duplicates) IP addresses from a file. Of course when matching IP (v4) addresses from a file (say HTML) it's quite easy to inadvertently match a version string or an url which contains versioning as part of its file path. Or if you're a nitpicker (and have a grep with -P), you can test the next: while read -r testline
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |