Splunk Search

Field extraction from source field

ppatkar
Path Finder

I have my Splunk source in the format below :

source=/default/folder/20190403/file_PARADOX_7747_txt

I am trying to only pick the file name from the source to do some analysis & unable to get rid of unwanted process id appended at the end i.e., I only need PARADOX from the above.

Below is the closest I have got so far , however I am unable to separate the process id from the file name

rex field=source "(?<logdir>[\w\W/]+)/file_(?<filename>[^.]+)_txt"
  • logdir : /default/folder/20190403/
  • filename : PARADOX_7747

Ideally, I would like the below output :

  • logdir : /default/folder/
  • date : 20190403
  • processid : 7747
  • filename : PARADOX
  • extension : txt

Any help is appreciated . Thank you.

0 Karma
1 Solution

ragedsparrow
SplunkTrust
SplunkTrust

If you only want the filename, I think @FrankVI or @vnravikumar would be a good approach. If you want it all parsed out:

 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

Here is what I used to test it:

| makeresults 
 | eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

View solution in original post

0 Karma

ppatkar
Path Finder

Thanks @FrankVI , @vnravikumar & @ragedsparrow for all your help .

Unfortunately my source pattern can contain multiple words in the file name but filename is always suffixed by process id like below :

source=/default/folder/20190403/file_PARADOX_7747_txt
source=/default/folder/20190402/file_AMR_CA_1234_txt
source=/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt

If there is a way to grab the file name between "file_" and a numeric digit ([0-9]) , it ll help .

0 Karma

ragedsparrow
SplunkTrust
SplunkTrust

I think this would work:

| rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\d]+)\_(?<processid>\d+)\_(?<extension>.+)"

I tested it here:

| makeresults 
  | eval source="/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt"
  | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\d]+)\_(?<processid>\d+)\_(?<extension>.+)"
0 Karma

ppatkar
Path Finder

Works like a charm ! Thank you

0 Karma

vnravikumar
Champion

Hi

Try this

| makeresults 
 | eval source = "source=/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt" 
 | rex field=source "file\_(?P<name>.+)_\d+"

ragedsparrow
SplunkTrust
SplunkTrust

If you only want the filename, I think @FrankVI or @vnravikumar would be a good approach. If you want it all parsed out:

 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

Here is what I used to test it:

| makeresults 
 | eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

View solution in original post

0 Karma

vnravikumar
Champion

Hi

Give a try

| makeresults 
| eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
| eval filename = mvindex(split(source,"_"),1)

OR

To avoid any directory that contains the underscore

| makeresults 
| eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
| rex field=source "\/(?P<filename>file.+)" 
| eval filename = mvindex(split(filename,"_"),1)

[New]:

Try this

| makeresults 
 | eval source = "/default/folder/20190402/file_AMR_CA_1234_txt" 
 | rex field=source "file\_(?P<name>.+)_\d+"
0 Karma

FrankVl
Ultra Champion

You were pretty close. I guess this should work (unless the filename can also contain _ or other variations on the format cause this to break in some cases.

| rex field=source "(?<logdir>[\w\W/]+)/file_(?<filename>[^_]+)_(?<processid>[^_]+)_txt"
0 Karma