Hi, I'm working on a Regex for field extractions of an alert log. The log has 1 line per alert in the following format:
[11/26/2013 9:13:41 AM] Server1 LogTest: /var/log Ok Text Log test
[11/26/2013 9:13:36 AM] Server1 LogTest: /var/log Bad <......data.......> Text Log test
The difficulty comes when handling some OK statuses; you'll notice here that a 'Bad' status returns data (the relevant log lines), but an 'Ok' status returns a blank (actually 2 tabs) data section.
It seems like every regex I come up with will accidentally capture some part of Text Log test
and use that as part of all of the data
section when data
isn't present.
Can I get some pointers on the proper regex expression? My current regex is below, and I think I've exhausted the guess and check method. 🙂
]\t+\s+(?P<server>.+?)\s+(?P<category>.+?)\s(?P<object>.+?)\t(?P<status>.+?)\t(?P<data>.+?)\t(?P<test>.+?)
Probably tacky to accept my own answer, but here's the final result for reference:
]\t+\s+(?P<server>.+?)\s+(?P<category>.+?):\s(?P<object>.+?)\t(?P<status>.+?)\t(?P<data>.*)\t(?P<test>.*)\t
This correctly matches event when a field has blank data. Adjust punctuation (\t,\s,:, and ]) as needed for your data.
Ayn, I actually read your notes here: http://answers.splunk.com/answers/67170/index-time-field-extraction about using search-time extractions....and I just learned what the difference is from the docs!
Probably tacky to accept my own answer, but here's the final result for reference:
]\t+\s+(?P<server>.+?)\s+(?P<category>.+?):\s(?P<object>.+?)\t(?P<status>.+?)\t(?P<data>.*)\t(?P<test>.*)\t
This correctly matches event when a field has blank data. Adjust punctuation (\t,\s,:, and ]) as needed for your data.
Would it work better if you change the end
(?P<status>.+?)\t(?P<data>.+?)\t(?P<test>.+?)
to
(?P<status>.+)\t(?P<data>.*)\t(?P<test>.+)
then you should match to an empty string if there is just 2 tabs in case of "Ok"? It sounds too easy and I didn't test it with Splunk, so maybe I'm missing something?
Thanks! This was almost perfect. See my answer below.
My mistake, a search time field extraction.
DELIMS doesn't work as an index-time extraction, and index-time extractions should be avoided unless you really know what you're doing and why.
Have you tried setting this up for search time extraction using the log delimiter and a preset series of fields?