Solved: Re: Field extraction from source field

ppatkar · ‎04-03-2019

I have my Splunk source in the format below :

source=/default/folder/20190403/file_PARADOX_7747_txt

I am trying to only pick the file name from the source to do some analysis & unable to get rid of unwanted process id appended at the end i.e., I only need PARADOX from the above.

Below is the closest I have got so far , however I am unable to separate the process id from the file name

rex field=source "(?<logdir>[\w\W/]+)/file_(?<filename>[^.]+)_txt"

logdir : /default/folder/20190403/
filename : PARADOX_7747

Ideally, I would like the below output :

logdir : /default/folder/
date : 20190403
processid : 7747
filename : PARADOX
extension : txt

Any help is appreciated . Thank you.

ragedsparrow · ‎04-03-2019

If you only want the filename, I think @FrankVI or @vnravikumar would be a good approach. If you want it all parsed out:

 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

Here is what I used to test it:

| makeresults 
 | eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

View solution in original post

ppatkar · ‎04-03-2019

Thanks @FrankVI , @vnravikumar & @ragedsparrow for all your help .

Unfortunately my source pattern can contain multiple words in the file name but filename is always suffixed by process id like below :

source=/default/folder/20190403/file_PARADOX_7747_txt
source=/default/folder/20190402/file_AMR_CA_1234_txt
source=/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt

If there is a way to grab the file name between "file_" and a numeric digit ([0-9]) , it ll help .

ragedsparrow · ‎04-03-2019

I think this would work:

| rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\d]+)\_(?<processid>\d+)\_(?<extension>.+)"

I tested it here:

| makeresults 
  | eval source="/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt"
  | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\d]+)\_(?<processid>\d+)\_(?<extension>.+)"

ppatkar · ‎04-03-2019

Works like a charm ! Thank you

vnravikumar · ‎04-03-2019

Hi

Try this

| makeresults 
 | eval source = "source=/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt" 
 | rex field=source "file\_(?P<name>.+)_\d+"

ragedsparrow · ‎04-03-2019

If you only want the filename, I think @FrankVI or @vnravikumar would be a good approach. If you want it all parsed out:

 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

Here is what I used to test it:

| makeresults 
 | eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

vnravikumar · ‎04-03-2019

Hi

Give a try

| makeresults 
| eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
| eval filename = mvindex(split(source,"_"),1)

OR

To avoid any directory that contains the underscore

| makeresults 
| eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
| rex field=source "\/(?P<filename>file.+)" 
| eval filename = mvindex(split(filename,"_"),1)

[New]:

Try this

| makeresults 
 | eval source = "/default/folder/20190402/file_AMR_CA_1234_txt" 
 | rex field=source "file\_(?P<name>.+)_\d+"

FrankVl · ‎04-03-2019

You were pretty close. I guess this should work (unless the filename can also contain _ or other variations on the format cause this to break in some cases.

| rex field=source "(?<logdir>[\w\W/]+)/file_(?<filename>[^_]+)_(?<processid>[^_]+)_txt"

Field extraction from source field

Changes to Splunk Instructor-Led Training Completion Criteria

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

Preparing your Splunk Environment for OpenSSL3