Splunk Search

How to extract file names from an unformatted URL string?

splunker9999
Path Finder

Hi We need to extract file name from a URL. But URL in the log files have different formats or it has multiple spaces for few as below.

Can someone please help us with extraction.
In the below format

1.it has space after /log(here file name should be data.csv)
    /home/data/var/log/ data.csv

2.It doesn't have space after log
    /home/data/var/log/data.csv

3.It has space with after log but file extension is different.(here file name is (data2015067987.dat)
    /home/data/var/log/ data2015067987.dat
    /home/data/var/log/  201608587data.csv

4.It has multiple spaces after/log (here file name is data data2 data3)
 /home/data/var/log/ data  data2 data3 

Thanks

0 Karma

woodcock
Esteemed Legend

Like this:

| rex field=URL mode=sed "s%/\s+%/%"
0 Karma

gokadroid
Motivator

If still required, this regex might be of help to cover all the scopes:

your base search
| rex field=_raw "\/((?<prefix>[^\s\/]+)\/)*(?<fileName>.*)"
| table prefix, fileName

See Extraction here

0 Karma

splunker9999
Path Finder

This works partially but it extracts all values after last segment, can we reextract for FileName .
It should only give highlighted bold value, I am looking to run regex again on FileName in such away that it should exclude 8 spaces in reverse and list "201608587data.csv" as file name

201608587data.csv b s o r user ssh 0 *

0 Karma

gokadroid
Motivator

I thought query was supposed catch that given you specifically wanted to catch multiple spaces as described here in original question:

It has multiple spaces after/log (here file name is data data2 data3)
  /home/data/var/log/ data  data2 data3 

So can it be a fair statement that if the immediate string of last section has a <dot> in it signifying a file extension, stop there, else continue to capture as mentioned in highlighted requirement of capturing data data2 data3

0 Karma

splunker9999
Path Finder

Yes thats true, we have multiple spaces after last segment for few URI.

Your search for file name actually braeaking from last segment till the end ,which is good one, But can we use another regex on fileName some thing like below

rex field=fileName (regex(which will check from last and exclude  8 spaces)

Thanks

0 Karma

somesoni2
Revered Legend

Give this a try

Updated

your base search | rex field=URL "^.+\/\s*(?<filename>[\w\s\.-:_]+)$"

Run anywhere sample

| gentimes start=-1 | eval URL="/home/data/var/log/ data.csv#/home/data/var/log/data.csv#/home/data/var/log/ data2015067987.dat#/home/data/var/log/  201608587data.csv#/home/data/var/log/ data  data2 data3" | table URL | makemv URL delim="#" | mvexpand URL | rex field=URL "^.+\/\s*(?<filename>[\w\s\.-:_]+)$"

somesoni2
Revered Legend

Try the updated answer.

0 Karma

splunker9999
Path Finder

This one has worked for many but didn't for a few, filename field return no value for below URI

For Ex: :No filename value returns for below :

/home/data/var/data/input/splunk/data_ip_2012-07-21-14-15-06.dat
/home/data/var/data/input/data_user_2012-07-21-141506.done
/home/data/var/data/inpu/data_user_2012-07-21-141506.dat

Thanks

0 Karma

somesoni2
Revered Legend

Give this a try.

your base search | rex field=URL "^.+\/\s*(?<filename>.*)$"
0 Karma

splunker9999
Path Finder

This doesn't worked, it just removed starting"/" from URL and return everything for file name.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...