Hi We need to extract file name from a URL. But URL in the log files have different formats or it has multiple spaces for few as below.
Can someone please help us with extraction.
In the below format
1.it has space after /log(here file name should be data.csv)
/home/data/var/log/ data.csv
2.It doesn't have space after log
/home/data/var/log/data.csv
3.It has space with after log but file extension is different.(here file name is (data2015067987.dat)
/home/data/var/log/ data2015067987.dat
/home/data/var/log/ 201608587data.csv
4.It has multiple spaces after/log (here file name is data data2 data3)
/home/data/var/log/ data data2 data3
Thanks
Like this:
| rex field=URL mode=sed "s%/\s+%/%"
If still required, this regex might be of help to cover all the scopes:
your base search
| rex field=_raw "\/((?<prefix>[^\s\/]+)\/)*(?<fileName>.*)"
| table prefix, fileName
This works partially but it extracts all values after last segment, can we reextract for FileName .
It should only give highlighted bold value, I am looking to run regex again on FileName in such away that it should exclude 8 spaces in reverse and list "201608587data.csv" as file name
201608587data.csv b s o r user ssh 0 *
I thought query was supposed catch that given you specifically wanted to catch multiple spaces as described here in original question:
It has multiple spaces after/log (here file name is data data2 data3)
/home/data/var/log/ data data2 data3
So can it be a fair statement that if the immediate string of last section has a <dot>
in it signifying a file extension, stop there, else continue to capture as mentioned in highlighted requirement of capturing data data2 data3
Yes thats true, we have multiple spaces after last segment for few URI.
Your search for file name actually braeaking from last segment till the end ,which is good one, But can we use another regex on fileName some thing like below
rex field=fileName (regex(which will check from last and exclude 8 spaces)
Thanks
Give this a try
Updated
your base search | rex field=URL "^.+\/\s*(?<filename>[\w\s\.-:_]+)$"
Run anywhere sample
| gentimes start=-1 | eval URL="/home/data/var/log/ data.csv#/home/data/var/log/data.csv#/home/data/var/log/ data2015067987.dat#/home/data/var/log/ 201608587data.csv#/home/data/var/log/ data data2 data3" | table URL | makemv URL delim="#" | mvexpand URL | rex field=URL "^.+\/\s*(?<filename>[\w\s\.-:_]+)$"
Try the updated answer.
This one has worked for many but didn't for a few, filename field return no value for below URI
For Ex: :No filename value returns for below :
/home/data/var/data/input/splunk/data_ip_2012-07-21-14-15-06.dat
/home/data/var/data/input/data_user_2012-07-21-141506.done
/home/data/var/data/inpu/data_user_2012-07-21-141506.dat
Thanks
Give this a try.
your base search | rex field=URL "^.+\/\s*(?<filename>.*)$"
This doesn't worked, it just removed starting"/" from URL and return everything for file name.