Splunk Search

How to extract file names from an unformatted URL string?

splunker9999
Path Finder

Hi We need to extract file name from a URL. But URL in the log files have different formats or it has multiple spaces for few as below.

Can someone please help us with extraction.
In the below format

1.it has space after /log(here file name should be data.csv)
    /home/data/var/log/ data.csv

2.It doesn't have space after log
    /home/data/var/log/data.csv

3.It has space with after log but file extension is different.(here file name is (data2015067987.dat)
    /home/data/var/log/ data2015067987.dat
    /home/data/var/log/  201608587data.csv

4.It has multiple spaces after/log (here file name is data data2 data3)
 /home/data/var/log/ data  data2 data3 

Thanks

0 Karma

woodcock
Esteemed Legend

Like this:

| rex field=URL mode=sed "s%/\s+%/%"
0 Karma

gokadroid
Motivator

If still required, this regex might be of help to cover all the scopes:

your base search
| rex field=_raw "\/((?<prefix>[^\s\/]+)\/)*(?<fileName>.*)"
| table prefix, fileName

See Extraction here

0 Karma

splunker9999
Path Finder

This works partially but it extracts all values after last segment, can we reextract for FileName .
It should only give highlighted bold value, I am looking to run regex again on FileName in such away that it should exclude 8 spaces in reverse and list "201608587data.csv" as file name

201608587data.csv b s o r user ssh 0 *

0 Karma

gokadroid
Motivator

I thought query was supposed catch that given you specifically wanted to catch multiple spaces as described here in original question:

It has multiple spaces after/log (here file name is data data2 data3)
  /home/data/var/log/ data  data2 data3 

So can it be a fair statement that if the immediate string of last section has a <dot> in it signifying a file extension, stop there, else continue to capture as mentioned in highlighted requirement of capturing data data2 data3

0 Karma

splunker9999
Path Finder

Yes thats true, we have multiple spaces after last segment for few URI.

Your search for file name actually braeaking from last segment till the end ,which is good one, But can we use another regex on fileName some thing like below

rex field=fileName (regex(which will check from last and exclude  8 spaces)

Thanks

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try

Updated

your base search | rex field=URL "^.+\/\s*(?<filename>[\w\s\.-:_]+)$"

Run anywhere sample

| gentimes start=-1 | eval URL="/home/data/var/log/ data.csv#/home/data/var/log/data.csv#/home/data/var/log/ data2015067987.dat#/home/data/var/log/  201608587data.csv#/home/data/var/log/ data  data2 data3" | table URL | makemv URL delim="#" | mvexpand URL | rex field=URL "^.+\/\s*(?<filename>[\w\s\.-:_]+)$"

somesoni2
SplunkTrust
SplunkTrust

Try the updated answer.

0 Karma

splunker9999
Path Finder

This one has worked for many but didn't for a few, filename field return no value for below URI

For Ex: :No filename value returns for below :

/home/data/var/data/input/splunk/data_ip_2012-07-21-14-15-06.dat
/home/data/var/data/input/data_user_2012-07-21-141506.done
/home/data/var/data/inpu/data_user_2012-07-21-141506.dat

Thanks

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try.

your base search | rex field=URL "^.+\/\s*(?<filename>.*)$"
0 Karma

splunker9999
Path Finder

This doesn't worked, it just removed starting"/" from URL and return everything for file name.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...