Solved: How can/should we handle the exceptions of field e...

ddrillic · ‎11-27-2018

As we build an entire infrastructure around field extractions, we see all these exceptions, where some of the events don't conform to the standard event layout, like headers, footers, etc. I wonder how we can process these exception events. If it was within a programming language, we would catch these exception events in an exception clause and handle them gracefully, but with the Splunk field extraction, we don't seem to have such a feature.

Richfez · ‎12-02-2018

Right, those sorts of features don't exist, IMO because not matching is perfectly fine for a regular expression - it only expects to match where it matches and do the thing it does when it does so. So not matching simply means "yep, no match here. Moving right along."

Also! Another point might be that it has zero idea if it's the only regex being applied. Maybe there's a CSV extraction first, 14 different extracts to parse data out of individual fields, and two more that when appropriate pull smaller pieces out of individual subfields. So... what should be a failure and what shouldn't be?

Anyway, that's all moot - your question does have an answer.

One technique we use is to count how common an extracted field that should be in ALL data actually is. Here we look back 24 hours or 1000 events, whichever is hit first, and then count events where extractions didn't happen.

index=myindex sourcetype=my_sourcetype earliest=-24h | head 1000
| eval areExtractionsFailing=if(isnull(MyFieldThatShouldBeInAllEvents), "1", "0")
| stats sum(areExtractionsFailing) as failed count as total

After you have that you can do some division to get a percentage or ratio, or just use the raw numbers - whatever it is you want. After some tuning, you could set up alerts to let you know when these things happen. Obviously, tune before spamming everyone with emails. 🙂

Happy Splunking!
-Rich

View solution in original post

Richfez · ‎12-02-2018

Right, those sorts of features don't exist, IMO because not matching is perfectly fine for a regular expression - it only expects to match where it matches and do the thing it does when it does so. So not matching simply means "yep, no match here. Moving right along."

Also! Another point might be that it has zero idea if it's the only regex being applied. Maybe there's a CSV extraction first, 14 different extracts to parse data out of individual fields, and two more that when appropriate pull smaller pieces out of individual subfields. So... what should be a failure and what shouldn't be?

Anyway, that's all moot - your question does have an answer.

One technique we use is to count how common an extracted field that should be in ALL data actually is. Here we look back 24 hours or 1000 events, whichever is hit first, and then count events where extractions didn't happen.

index=myindex sourcetype=my_sourcetype earliest=-24h | head 1000
| eval areExtractionsFailing=if(isnull(MyFieldThatShouldBeInAllEvents), "1", "0")
| stats sum(areExtractionsFailing) as failed count as total

After you have that you can do some division to get a percentage or ratio, or just use the raw numbers - whatever it is you want. After some tuning, you could set up alerts to let you know when these things happen. Obviously, tune before spamming everyone with emails. 🙂

Happy Splunking!
-Rich

ddrillic · ‎12-02-2018

Much appreciated @rich7177.

How can/should we handle the exceptions of field extractions?

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation