Splunk Search

How to create a regex to match URL ending with file extension to detect file downloads?

jkumarr2
New Member

I am trying to write a regex which will detect/match URLs ending with 2, 3 & 4 letter file extensions (eg - .py, .txt, xlsx and the numerous other known file extensions) . I used the regex Splunk search:

|regex field url=".*[a-zA-Z]{2-4}$"

but this will match URLs like www.liverpoolfc.com which does not end with file extensions.

Also tried with this regex:

| regex url="//.+?/.+?.$" 

Which will look for the http: or https: then two "/" followed by the top level domain and one "/" followed by any stream of character and ending with 2 to 4 letter word, but this is not giving the correct results, its omitting few URLs which have multiple "/" in the full URL path, any better suggestions ?

Below is a sample set of URLs that I used as a reference:

http://www.liverpoolfc.com
http://www.blackberry.com
http://www.lflogistics.com/sites/default/files/news/lflstc.pdf
https://www.abc.com/tiny/7uwi2
https://download.abc.com/download/ep/FE-90CRC000-28.zip
http://www3.abce.hk/listedco/listconews/SEHK/2019/0521/LTN20190521894.pdf
https://www.abc.com/review/www.xyz-center.com
https://xyz.abc.com/abc-voyager.php
http://wealthbriefing.com/forms/view.php?id=1456762⪙ement_34=saint.xyz@gmail.com
0 Karma

woodcock
Esteemed Legend

Like this:

... |regex url="^https?:\/\/.*[\\\/].+\.[a-zA-Z]{2,4}$"
0 Karma

jnudell_2
Builder

Hi @jkumarr2 ,

I would use something like this:

... your search ...
| regex url="(https?:\/\/)?([A-Za-z0-9\-]+)?\.([A-Za-z0-9\-]+)\.([A-Za-z0-9\-]+)(\/?.*\/(.+\.[A-Za-z]{2,3})$)"

or maybe:

... your search ...
| regex url=".*\/\/[^\/]+\/?.*\/.*\.[A-Za-z]{2,3}"
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jkumarr2,
try this one

(?P<URL>[^ ]*\.\w*)$

You can test it at https://regex101.com/r/2syl1Z/1

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...