Splunk Search

How to create a regex field extraction for the last occurence for a specific text?

dhirendra761
Contributor

Hi,

We have attached log file.link text The whole log file contains in one single event in splunk.
Now, I need to extract data(filename, date, time) from only last lines of text.
ex:
Try upload file :
Upload File D:\Program Files\X529\Matrix IT Software PK\PTS\Files\JobLettrers\BAAppointmentLetters_2016_4_9_13_0.csv Complete, status : 226 Transfer complete.
Closing log at 2:00:56 PM on 4/29/2016

to extract this I tried with my below SPL:

    index="main" source="Sample_log.txt" | rex field=log "Try upload file : (?<file>.*)\nUpload File (?<msg>.*)([\S\s\n]*)\nClosing log at (?<time>.*) on (?<date>.*)" | table file msg  time date

but this regex is not working as it capture many of line of text in log field and consider the only first one.link text

Please suggest. Thanks.

Dhirendra

0 Karma
1 Solution

FrankVl
Ultra Champion

That's because you're using way to generic matchings in your regex. See: https://regex101.com/r/o0Bm3F/1

Especially ([\S\s\n]*) which matches non-whitespace and whitespace and newline (which is also contained in whitespace as well), so basically matches anything. You will need to make your regex more specific to have it only match the last line.

Also your capture groups don't seem to be in the right place (the filename comes after the "Upload File" text, not before. Not entirely sure what you want to capture in the msg field.

Try this:

 index="main" source="Sample_log.txt" | rex field=log "Try upload file :\s+Upload File (?<file>.*?)\s+(?<msg>\w+,[^.]+\.)\s+Closing log at (?<time>\d+:\d+:\d+\s+\w+) on (?<date>\d+\/\d+\/\d+)" | table file msg  time date

See also: https://regex101.com/r/Yw0rpg/1

View solution in original post

harsmarvania57
SplunkTrust
SplunkTrust

Hi,

Please try below regex.

<yourBaseSearch>
| regex field=<yourfield> "Try[^\:]+\:\s(?<file>[^\v]+)?\vUpload\sFile\s(?<msg>[^\.]+\.[^\s]+\s[^\v]+)\v{2}Closing[^\d]+(?<time>[^on]+)on\s(?<date>[^\$]+)"

Regex101: https://regex101.com/r/vqfSMz/1

EDIT: Updated regex and removed () from (?<time>[^(on)]+) , credit goes to @FrankVl

dhirendra761
Contributor

Thanks for your answer @harsmarvania57

0 Karma

FrankVl
Ultra Champion

Did you test that? Cause it doesn't work: https://regex101.com/r/vlXUdG/1
Capture groups are not in the right spot and there is no newline after the filename.

Also [^(on)]is a bit of a strange notation. The () are pointless there. (and I could make similar comments on some of your other regex syntax.

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

I didn't tested that in Splunk only on regex101, [^(on)] yes that is strange one but there are no difference if you remove () or keep it in regex it will still use same steps to capture the result. You are most welcome to comment on my other regex as well. 🙂

0 Karma

FrankVl
Ultra Champion

So in regex101 you noticed it is not capturing the filename and putting both filename and status info into the msg field?

[^(on)] : there is very much difference between including the () or not. Not for this sample data, but including the () means match any charachter not equal to (, o,n or ). Without the () it just means match any character not equal to o or n.
[^\.] : no backslash needed when you use . inside a character class definition.
[^\v] [^\s] [^\d]: You could simply use \V \S \D instead, or write actual specific regexes to match what is expected.

harsmarvania57
SplunkTrust
SplunkTrust

Yes I agree with [^(on)] that it will match ( OR ) but in this example it is not present.

Regarding [^\.] if we provide backslash, will there be any drawback ?

[^\v] [^\s] [^\d] can you please explain benefit to use \V \S \D because both are doing same work.

0 Karma

FrankVl
Ultra Champion

No specific drawback of benefit. Just easier to read, but that is perhaps also personal preference 🙂

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Thanks for all the info provided, always learning something new. 🙂

FrankVl
Ultra Champion

That's because you're using way to generic matchings in your regex. See: https://regex101.com/r/o0Bm3F/1

Especially ([\S\s\n]*) which matches non-whitespace and whitespace and newline (which is also contained in whitespace as well), so basically matches anything. You will need to make your regex more specific to have it only match the last line.

Also your capture groups don't seem to be in the right place (the filename comes after the "Upload File" text, not before. Not entirely sure what you want to capture in the msg field.

Try this:

 index="main" source="Sample_log.txt" | rex field=log "Try upload file :\s+Upload File (?<file>.*?)\s+(?<msg>\w+,[^.]+\.)\s+Closing log at (?<time>\d+:\d+:\d+\s+\w+) on (?<date>\d+\/\d+\/\d+)" | table file msg  time date

See also: https://regex101.com/r/Yw0rpg/1

dhirendra761
Contributor

Thanks for your answer @FrankVI

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...