Splunk Search

How to write a search to ignore a matching line, but not other lines in event to filter out false positives for an alert?

gaqzi
Explorer

I'm logging Rails requests and have taught Splunk about our logging format.
When there's a new release of our app, I usually find some events that are marked as an error, warning etc, that are false positives. I would like to ignore those lines for the purpose of alerting, but keep them around in the index. By just doing

NOT <pattern>

I end up dropping an entire event which might have other lines I care about.

Splunk will store all log levels used in an event in a multivalue field, and that's what I use to determine whether an event contains any events I care about. But I don't know how to ignore one specific line in the event for alerting. The search I use:

sourcetype=rails [search sourcetype=rails log_level=ERROR OR log_level=FATAL | dedup request_uuid | fields request_uuid] 
| transaction request_uuid

In the past before I taught Splunk about log_level I used to do a text search for error, fatal etc and return all matching.
And to ignore I would do | rex mode=sed "s/ERROR( --: a pattern)/INFO\1/g", which nicely makes the text search ignore those lines. Example:

sourcetype=rails
| rex mode=sed "s/ERROR( --: a pattern)/INFO\1/g"
| search error OR fatal

Sadly that doesn't work now as the log_level already has been set by the time the rex call gets made.

I stopped doing the search this way because we were getting too many false positives from filenames, reviews, and other things happening on the site while it was being used.

Any suggestions for a built-in way to handle this?
To me it seems like I'll have to go down the route of making a custom search command, but I would rather not if there's a sensible way of dealing with this within Splunk. 🙂

Update: woodcock asked for an example to better understand what I mean, here's an example I hope will illustrate what I mean. 🙂

I, [2015-10-06T17:00:56.249835 #29120] INFO -- : [class=Tasks::BusinessReporting::ReportTask] [user_type=Authorization::AnonymousUser] Started with args ["15"]
W, [2015-10-06T17:00:56.345676 #29120] WARN -- : [class=Tasks::BusinessReporting::ReportTask] [user_type=Authorization::AnonymousUser] a valid message
W, [2015-10-06T17:00:56.349008 #29120] WARN -- : [class=Tasks::BusinessReporting::ReportTask] [user_type=Authorization::AnonymousUser] an invalid message

W, [2015-10-06T17:00:56.349568 #29120] INFO -- : Finished running report task at 2015-10-06 17:00:56 +0800

The event above contains two warnings, one that I currently don't want to be alerted about. So I would like to find a way to make the "a valid message" be ignored for the sake of alerting. Since just doing a NOT "a valid message" will drop even the "invalid message" I want to be alerted about.

0 Karma
1 Solution

gaqzi
Explorer

After much thinking and testing I realized that rex could reevaluate log_level after I had modified it, so the method I used originally could basically be used again.

To help with this I ended up creating a macro like this:

[reevaluate_log_level]
definition = | rex field=_raw "(?m)^\w,\s\[.*?\]\s+(?<log_level>[^\s]+)"
iseval = 0

Which worked in conjunction with this macro:

[where_log_level_more_than_info]
definition = log_level=ERROR OR log_level=WARN OR log_level=CRIT OR log_level=FATAL
iseval = 0

Which I used like this:

sourcetype=rails [search sourcetype=rails `where_log_level_more_than_info` | dedup request_uuid | fields request_uuid] 
| rex mode=sed "s/ERROR( --: a pattern)/INFO\1/g"
`reevaluate_log_level`
| search `where_log_level_more_than_info`
| transaction request_uuid

A bit summed up what I'm doing is this:

  1. Search for all entries where log level is not INFO
  2. Replace a certain pattern that is currently logged as ERROR into INFO
  3. Reevaluate my original log_level multi-value field so it notices that I've rewritten it to INFO on the fly
  4. Now ensure all the remaining events have at least one line where log level is not INFO

With this I'm able to rewrite the first-pass results on the fly, so I can ignore known issues for this particular alert.

View solution in original post

0 Karma

gaqzi
Explorer

After much thinking and testing I realized that rex could reevaluate log_level after I had modified it, so the method I used originally could basically be used again.

To help with this I ended up creating a macro like this:

[reevaluate_log_level]
definition = | rex field=_raw "(?m)^\w,\s\[.*?\]\s+(?<log_level>[^\s]+)"
iseval = 0

Which worked in conjunction with this macro:

[where_log_level_more_than_info]
definition = log_level=ERROR OR log_level=WARN OR log_level=CRIT OR log_level=FATAL
iseval = 0

Which I used like this:

sourcetype=rails [search sourcetype=rails `where_log_level_more_than_info` | dedup request_uuid | fields request_uuid] 
| rex mode=sed "s/ERROR( --: a pattern)/INFO\1/g"
`reevaluate_log_level`
| search `where_log_level_more_than_info`
| transaction request_uuid

A bit summed up what I'm doing is this:

  1. Search for all entries where log level is not INFO
  2. Replace a certain pattern that is currently logged as ERROR into INFO
  3. Reevaluate my original log_level multi-value field so it notices that I've rewritten it to INFO on the fly
  4. Now ensure all the remaining events have at least one line where log level is not INFO

With this I'm able to rewrite the first-pass results on the fly, so I can ignore known issues for this particular alert.

0 Karma

woodcock
Esteemed Legend

Although you clearly spent time laying out your situation, I am still not able to see what exactly you mean. Perhaps it will help if you added an example that includes events and search results.

gaqzi
Explorer

I've added an example, thanks for the suggestion 🙂

I've managed to get it working and I'll do a writeup of that as well. Thanks!

0 Karma

woodcock
Esteemed Legend

Great; make sure you click "Accept" on your answer.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...