I'm logging Rails requests and have taught Splunk about our logging format.
When there's a new release of our app, I usually find some events that are marked as an error, warning etc, that are false positives. I would like to ignore those lines for the purpose of alerting, but keep them around in the index. By just doing
NOT <pattern>
I end up dropping an entire event which might have other lines I care about.
Splunk will store all log levels used in an event in a multivalue field, and that's what I use to determine whether an event contains any events I care about. But I don't know how to ignore one specific line in the event for alerting. The search I use:
sourcetype=rails [search sourcetype=rails log_level=ERROR OR log_level=FATAL | dedup request_uuid | fields request_uuid]
| transaction request_uuid
In the past before I taught Splunk about log_level
I used to do a text search for error, fatal etc and return all matching.
And to ignore I would do | rex mode=sed "s/ERROR( --: a pattern)/INFO\1/g"
, which nicely makes the text search ignore those lines. Example:
sourcetype=rails
| rex mode=sed "s/ERROR( --: a pattern)/INFO\1/g"
| search error OR fatal
Sadly that doesn't work now as the log_level
already has been set by the time the rex
call gets made.
I stopped doing the search this way because we were getting too many false positives from filenames, reviews, and other things happening on the site while it was being used.
Any suggestions for a built-in way to handle this?
To me it seems like I'll have to go down the route of making a custom search command, but I would rather not if there's a sensible way of dealing with this within Splunk. 🙂
Update: woodcock asked for an example to better understand what I mean, here's an example I hope will illustrate what I mean. 🙂
I, [2015-10-06T17:00:56.249835 #29120] INFO -- : [class=Tasks::BusinessReporting::ReportTask] [user_type=Authorization::AnonymousUser] Started with args ["15"]
W, [2015-10-06T17:00:56.345676 #29120] WARN -- : [class=Tasks::BusinessReporting::ReportTask] [user_type=Authorization::AnonymousUser] a valid message
W, [2015-10-06T17:00:56.349008 #29120] WARN -- : [class=Tasks::BusinessReporting::ReportTask] [user_type=Authorization::AnonymousUser] an invalid message
W, [2015-10-06T17:00:56.349568 #29120] INFO -- : Finished running report task at 2015-10-06 17:00:56 +0800
The event above contains two warnings, one that I currently don't want to be alerted about. So I would like to find a way to make the "a valid message" be ignored for the sake of alerting. Since just doing a NOT "a valid message"
will drop even the "invalid message" I want to be alerted about.
After much thinking and testing I realized that rex
could reevaluate log_level
after I had modified it, so the method I used originally could basically be used again.
To help with this I ended up creating a macro like this:
[reevaluate_log_level]
definition = | rex field=_raw "(?m)^\w,\s\[.*?\]\s+(?<log_level>[^\s]+)"
iseval = 0
Which worked in conjunction with this macro:
[where_log_level_more_than_info]
definition = log_level=ERROR OR log_level=WARN OR log_level=CRIT OR log_level=FATAL
iseval = 0
Which I used like this:
sourcetype=rails [search sourcetype=rails `where_log_level_more_than_info` | dedup request_uuid | fields request_uuid]
| rex mode=sed "s/ERROR( --: a pattern)/INFO\1/g"
`reevaluate_log_level`
| search `where_log_level_more_than_info`
| transaction request_uuid
A bit summed up what I'm doing is this:
log_level
multi-value field so it notices that I've rewritten it to INFO on the flyWith this I'm able to rewrite the first-pass results on the fly, so I can ignore known issues for this particular alert.
After much thinking and testing I realized that rex
could reevaluate log_level
after I had modified it, so the method I used originally could basically be used again.
To help with this I ended up creating a macro like this:
[reevaluate_log_level]
definition = | rex field=_raw "(?m)^\w,\s\[.*?\]\s+(?<log_level>[^\s]+)"
iseval = 0
Which worked in conjunction with this macro:
[where_log_level_more_than_info]
definition = log_level=ERROR OR log_level=WARN OR log_level=CRIT OR log_level=FATAL
iseval = 0
Which I used like this:
sourcetype=rails [search sourcetype=rails `where_log_level_more_than_info` | dedup request_uuid | fields request_uuid]
| rex mode=sed "s/ERROR( --: a pattern)/INFO\1/g"
`reevaluate_log_level`
| search `where_log_level_more_than_info`
| transaction request_uuid
A bit summed up what I'm doing is this:
log_level
multi-value field so it notices that I've rewritten it to INFO on the flyWith this I'm able to rewrite the first-pass results on the fly, so I can ignore known issues for this particular alert.
Although you clearly spent time laying out your situation, I am still not able to see what exactly you mean. Perhaps it will help if you added an example that includes events and search results.
I've added an example, thanks for the suggestion 🙂
I've managed to get it working and I'll do a writeup of that as well. Thanks!
Great; make sure you click "Accept" on your answer.