Splunk Search

Field Extraction (Regex) When Column Is Sometimes Absent

RMartinezDTV
Path Finder

Hi, I'm working on a Regex for field extractions of an alert log. The log has 1 line per alert in the following format:

[11/26/2013 9:13:41 AM]     Server1 LogTest: /var/log   Ok      Text Log test
[11/26/2013 9:13:36 AM]     Server1 LogTest: /var/log   Bad <......data.......> Text Log test

The difficulty comes when handling some OK statuses; you'll notice here that a 'Bad' status returns data (the relevant log lines), but an 'Ok' status returns a blank (actually 2 tabs) data section.

It seems like every regex I come up with will accidentally capture some part of Text Log test and use that as part of all of the data section when data isn't present.

Can I get some pointers on the proper regex expression? My current regex is below, and I think I've exhausted the guess and check method. 🙂

]\t+\s+(?P<server>.+?)\s+(?P<category>.+?)\s(?P<object>.+?)\t(?P<status>.+?)\t(?P<data>.+?)\t(?P<test>.+?)
Tags (2)
0 Karma
1 Solution

RMartinezDTV
Path Finder

Probably tacky to accept my own answer, but here's the final result for reference:

]\t+\s+(?P<server>.+?)\s+(?P<category>.+?):\s(?P<object>.+?)\t(?P<status>.+?)\t(?P<data>.*)\t(?P<test>.*)\t

This correctly matches event when a field has blank data. Adjust punctuation (\t,\s,:, and ]) as needed for your data.

View solution in original post

0 Karma

RMartinezDTV
Path Finder

Ayn, I actually read your notes here: http://answers.splunk.com/answers/67170/index-time-field-extraction about using search-time extractions....and I just learned what the difference is from the docs!

0 Karma

RMartinezDTV
Path Finder

Probably tacky to accept my own answer, but here's the final result for reference:

]\t+\s+(?P<server>.+?)\s+(?P<category>.+?):\s(?P<object>.+?)\t(?P<status>.+?)\t(?P<data>.*)\t(?P<test>.*)\t

This correctly matches event when a field has blank data. Adjust punctuation (\t,\s,:, and ]) as needed for your data.

0 Karma

kallu
Communicator

Would it work better if you change the end

(?P<status>.+?)\t(?P<data>.+?)\t(?P<test>.+?)

to

(?P<status>.+)\t(?P<data>.*)\t(?P<test>.+)

then you should match to an empty string if there is just 2 tabs in case of "Ok"? It sounds too easy and I didn't test it with Splunk, so maybe I'm missing something?

RMartinezDTV
Path Finder

Thanks! This was almost perfect. See my answer below.

0 Karma

lukejadamec
Super Champion

My mistake, a search time field extraction.

0 Karma

Ayn
Legend

DELIMS doesn't work as an index-time extraction, and index-time extractions should be avoided unless you really know what you're doing and why.

lukejadamec
Super Champion

Have you tried setting this up for search time extraction using the log delimiter and a preset series of fields?

0 Karma
Get Updates on the Splunk Community!

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

For Splunk Cloud customers, understanding and optimizing Splunk Virtual Compute (SVC) usage and resource ...

Automatic Discovery Part 3: Practical Use Cases

If you’ve enabled Automatic Discovery in your install of the Splunk Distribution of the OpenTelemetry ...