Splunk Search

How to create a regex to match URL ending with file extension to detect file downloads?

jkumarr2
New Member

I am trying to write a regex which will detect/match URLs ending with 2, 3 & 4 letter file extensions (eg - .py, .txt, xlsx and the numerous other known file extensions) . I used the regex Splunk search:

|regex field url=".*[a-zA-Z]{2-4}$"

but this will match URLs like www.liverpoolfc.com which does not end with file extensions.

Also tried with this regex:

| regex url="//.+?/.+?.$" 

Which will look for the http: or https: then two "/" followed by the top level domain and one "/" followed by any stream of character and ending with 2 to 4 letter word, but this is not giving the correct results, its omitting few URLs which have multiple "/" in the full URL path, any better suggestions ?

Below is a sample set of URLs that I used as a reference:

http://www.liverpoolfc.com
http://www.blackberry.com
http://www.lflogistics.com/sites/default/files/news/lflstc.pdf
https://www.abc.com/tiny/7uwi2
https://download.abc.com/download/ep/FE-90CRC000-28.zip
http://www3.abce.hk/listedco/listconews/SEHK/2019/0521/LTN20190521894.pdf
https://www.abc.com/review/www.xyz-center.com
https://xyz.abc.com/abc-voyager.php
http://wealthbriefing.com/forms/view.php?id=1456762⪙ement_34=saint.xyz@gmail.com
0 Karma

woodcock
Esteemed Legend

Like this:

... |regex url="^https?:\/\/.*[\\\/].+\.[a-zA-Z]{2,4}$"
0 Karma

jnudell_2
Builder

Hi @jkumarr2 ,

I would use something like this:

... your search ...
| regex url="(https?:\/\/)?([A-Za-z0-9\-]+)?\.([A-Za-z0-9\-]+)\.([A-Za-z0-9\-]+)(\/?.*\/(.+\.[A-Za-z]{2,3})$)"

or maybe:

... your search ...
| regex url=".*\/\/[^\/]+\/?.*\/.*\.[A-Za-z]{2,3}"
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jkumarr2,
try this one

(?P<URL>[^ ]*\.\w*)$

You can test it at https://regex101.com/r/2syl1Z/1

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...