As we build an entire infrastructure around field extractions, we see all these exceptions, where some of the events don't conform to the standard event layout, like headers, footers, etc. I wonder how we can process these exception events. If it was within a programming language, we would catch these exception events in an exception clause and handle them gracefully, but with the Splunk field extraction, we don't seem to have such a feature.
Right, those sorts of features don't exist, IMO because not matching is perfectly fine for a regular expression - it only expects to match where it matches and do the thing it does when it does so. So not matching simply means "yep, no match here. Moving right along."
Also! Another point might be that it has zero idea if it's the only regex being applied. Maybe there's a CSV extraction first, 14 different extracts to parse data out of individual fields, and two more that when appropriate pull smaller pieces out of individual subfields. So... what should be a failure and what shouldn't be?
Anyway, that's all moot - your question does have an answer.
One technique we use is to count how common an extracted field that should be in ALL data actually is. Here we look back 24 hours or 1000 events, whichever is hit first, and then count events where extractions didn't happen.
index=myindex sourcetype=my_sourcetype earliest=-24h | head 1000
| eval areExtractionsFailing=if(isnull(MyFieldThatShouldBeInAllEvents), "1", "0")
| stats sum(areExtractionsFailing) as failed count as total
After you have that you can do some division to get a percentage or ratio, or just use the raw numbers - whatever it is you want. After some tuning, you could set up alerts to let you know when these things happen. Obviously, tune before spamming everyone with emails. 🙂
Happy Splunking!
-Rich
Right, those sorts of features don't exist, IMO because not matching is perfectly fine for a regular expression - it only expects to match where it matches and do the thing it does when it does so. So not matching simply means "yep, no match here. Moving right along."
Also! Another point might be that it has zero idea if it's the only regex being applied. Maybe there's a CSV extraction first, 14 different extracts to parse data out of individual fields, and two more that when appropriate pull smaller pieces out of individual subfields. So... what should be a failure and what shouldn't be?
Anyway, that's all moot - your question does have an answer.
One technique we use is to count how common an extracted field that should be in ALL data actually is. Here we look back 24 hours or 1000 events, whichever is hit first, and then count events where extractions didn't happen.
index=myindex sourcetype=my_sourcetype earliest=-24h | head 1000
| eval areExtractionsFailing=if(isnull(MyFieldThatShouldBeInAllEvents), "1", "0")
| stats sum(areExtractionsFailing) as failed count as total
After you have that you can do some division to get a percentage or ratio, or just use the raw numbers - whatever it is you want. After some tuning, you could set up alerts to let you know when these things happen. Obviously, tune before spamming everyone with emails. 🙂
Happy Splunking!
-Rich
Much appreciated @rich7177.