topic Re: Field extraction from source field in Splunk Search

Field extraction from source field

ppatkar — Tue, 29 Sep 2020 23:56:28 GMT

I have my Splunk source in the format below :

source=/default/folder/20190403/file_PARADOX_7747_txt

I am trying to only pick the file name from the source to do some analysis & unable to get rid of unwanted process id appended at the end i.e., I only need PARADOX from the above.

Below is the closest I have got so far , however I am unable to separate the process id from the file name

rex field=source "(?<logdir>[\w\W/]+)/file_(?<filename>[^.]+)_txt"

logdir : /default/folder/20190403/
filename : PARADOX_7747

Ideally, I would like the below output :

logdir : /default/folder/
date : 20190403
processid : 7747
filename : PARADOX
extension : txt

Any help is appreciated . Thank you.

Re: Field extraction from source field

FrankVl — Wed, 03 Apr 2019 14:29:12 GMT

You were pretty close. I guess this should work (unless the filename can also contain _ or other variations on the format cause this to break in some cases.

| rex field=source "(?<logdir>[\w\W/]+)/file_(?<filename>[^_]+)_(?<processid>[^_]+)_txt"

Re: Field extraction from source field

vnravikumar — Wed, 03 Apr 2019 15:03:45 GMT

Give a try

| makeresults 
| eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
| eval filename = mvindex(split(source,"_"),1)

To avoid any directory that contains the underscore

| makeresults 
| eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
| rex field=source "\/(?P<filename>file.+)" 
| eval filename = mvindex(split(filename,"_"),1)

[New]:

Try this

| makeresults 
 | eval source = "/default/folder/20190402/file_AMR_CA_1234_txt" 
 | rex field=source "file\_(?P<name>.+)_\d+"

Re: Field extraction from source field

ragedsparrow — Wed, 03 Apr 2019 15:10:07 GMT

If you only want the filename, I think @FrankVI or @vnravikumar would be a good approach. If you want it all parsed out:

 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

Here is what I used to test it:

| makeresults 
 | eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

Re: Field extraction from source field

ppatkar — Tue, 29 Sep 2020 23:56:30 GMT

Thanks @FrankVI , @vnravikumar & @ragedsparrow for all your help .

Unfortunately my source pattern can contain multiple words in the file name but filename is always suffixed by process id like below :

source=/default/folder/20190403/file_PARADOX_7747_txt
source=/default/folder/20190402/file_AMR_CA_1234_txt
source=/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt

If there is a way to grab the file name between "file_" and a numeric digit ([0-9]) , it ll help .

Re: Field extraction from source field

ragedsparrow — Wed, 03 Apr 2019 16:14:32 GMT

I think this would work:

| rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\d]+)\_(?<processid>\d+)\_(?<extension>.+)"

I tested it here:

| makeresults 
  | eval source="/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt"
  | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\d]+)\_(?<processid>\d+)\_(?<extension>.+)"

Re: Field extraction from source field

vnravikumar — Wed, 03 Apr 2019 16:28:40 GMT

Try this

| makeresults 
 | eval source = "source=/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt" 
 | rex field=source "file\_(?P<name>.+)_\d+"

Re: Field extraction from source field

ppatkar — Wed, 03 Apr 2019 17:33:24 GMT

Works like a charm ! Thank you