I am trying to write a regex which will detect/match URLs ending with 2, 3 & 4 letter file extensions (eg - .py, .txt, xlsx and the numerous other known file extensions) . I used the regex Splunk search:
|regex field url=".*[a-zA-Z]{2-4}$"
but this will match URLs like www.liverpoolfc.com which does not end with file extensions.
Also tried with this regex:
| regex url="//.+?/.+?.$"
Which will look for the http: or https: then two "/" followed by the top level domain and one "/" followed by any stream of character and ending with 2 to 4 letter word, but this is not giving the correct results, its omitting few URLs which have multiple "/" in the full URL path, any better suggestions ?
Below is a sample set of URLs that I used as a reference:
http://www.liverpoolfc.com
http://www.blackberry.com
http://www.lflogistics.com/sites/default/files/news/lflstc.pdf
https://www.abc.com/tiny/7uwi2
https://download.abc.com/download/ep/FE-90CRC000-28.zip
http://www3.abce.hk/listedco/listconews/SEHK/2019/0521/LTN20190521894.pdf
https://www.abc.com/review/www.xyz-center.com
https://xyz.abc.com/abc-voyager.php
http://wealthbriefing.com/forms/view.php?id=1456762⪙ement_34=saint.xyz@gmail.com
Like this:
... |regex url="^https?:\/\/.*[\\\/].+\.[a-zA-Z]{2,4}$"
Hi @jkumarr2 ,
I would use something like this:
... your search ...
| regex url="(https?:\/\/)?([A-Za-z0-9\-]+)?\.([A-Za-z0-9\-]+)\.([A-Za-z0-9\-]+)(\/?.*\/(.+\.[A-Za-z]{2,3})$)"
or maybe:
... your search ...
| regex url=".*\/\/[^\/]+\/?.*\/.*\.[A-Za-z]{2,3}"
Hi jkumarr2,
try this one
(?P<URL>[^ ]*\.\w*)$
You can test it at https://regex101.com/r/2syl1Z/1
Bye.
Giuseppe