Alerting

Why are my real-time alerting searches no longer sending emails for matching events after upgrading to 6.1?

Splunk Employee
Splunk Employee

Right after upgrading to 6.1, I noticed that some scheduled real-time searches fail to send emails or trigger any otherwise configured alert actions after they have been running for a while.

Alerts are sent initially, but after a few hours, even though events come in that match the search, there are no alerts triggered.

Is this a known bug?

Tags (4)
1 Solution

Splunk Employee
Splunk Employee

This is bug SPL-84357 and is specific to 6.1 and 6.1.1.

1 - Known symptoms

There are two known signatures to this bug:

  • In splunkd_access.log, we can see entries recorded for the failing alerting searches that show splunkd denying access (401) for a POST to the .../saved/searches/{search name}/notify?trigger.condition_state=1 to a local requester (client IP = 127.0.0.1), which actually is the search process:


    127.0.0.1 - - [20/May/2014:12:43:16.856 -0700] "POST /servicesNS/admin/search/saved/searches/test%20per-result%20alerting/notify?trigger.condition_state=1 HTTP/1.0" 401 148 - - - 0ms

    In this example, splunkd is denying the search process running the "test per-result alerting" search the creation of an alert item. These messages will appear every time a matching event is found, but the alert cannot be created.

  • In the affected search's search.log, we can see a corresponding client-side message, reporting the authorization denial from splunkd to create an alert item:


    05-20-2014 12:43:16.857 WARN SearchStateListener - Search listener notification returned non 2XX status code, status_code=401; Success. Removing fake artifacts, sid=rt_scheduler_adminsearch_RMD58061d21d6537baaa_at_1400533557_0.12

    These messages will appear every time a matching event is found, but the alert cannot be created.

2 - Root cause

The root cause is that the process of the real-time alerting search is using an authentication token to communicate back to splunkd which is inappropriately subjected to the splunkd session timeout configured in server.conf, with a default value of 1 hour. From $SPLUNK_HOME/etc/system/default/server.conf:


general
sessionTimeout=1h

This timeout is pushed back every time the search process talks back to splunkd, which is why this issue will not occur for searches that match events frequently. However, if matching events come in more than 1 hour apart, the token will expire and the error will occur.

3 - Work-around

The work-around is to temporarily extend "sessionTimeout" in $SPLUNK_HOME/etc/system/local/server.conf to a value that will be longer than the interval between two matched events, thus preventing the token from expiring:


[general]
sessionTimeout = 30d

4 - Resolution

This issue is slated to be fixed in our next maintenance release: 6.1.2

View solution in original post

Splunk Employee
Splunk Employee

This is bug SPL-84357 and is specific to 6.1 and 6.1.1.

1 - Known symptoms

There are two known signatures to this bug:

  • In splunkd_access.log, we can see entries recorded for the failing alerting searches that show splunkd denying access (401) for a POST to the .../saved/searches/{search name}/notify?trigger.condition_state=1 to a local requester (client IP = 127.0.0.1), which actually is the search process:


    127.0.0.1 - - [20/May/2014:12:43:16.856 -0700] "POST /servicesNS/admin/search/saved/searches/test%20per-result%20alerting/notify?trigger.condition_state=1 HTTP/1.0" 401 148 - - - 0ms

    In this example, splunkd is denying the search process running the "test per-result alerting" search the creation of an alert item. These messages will appear every time a matching event is found, but the alert cannot be created.

  • In the affected search's search.log, we can see a corresponding client-side message, reporting the authorization denial from splunkd to create an alert item:


    05-20-2014 12:43:16.857 WARN SearchStateListener - Search listener notification returned non 2XX status code, status_code=401; Success. Removing fake artifacts, sid=rt_scheduler_adminsearch_RMD58061d21d6537baaa_at_1400533557_0.12

    These messages will appear every time a matching event is found, but the alert cannot be created.

2 - Root cause

The root cause is that the process of the real-time alerting search is using an authentication token to communicate back to splunkd which is inappropriately subjected to the splunkd session timeout configured in server.conf, with a default value of 1 hour. From $SPLUNK_HOME/etc/system/default/server.conf:


general
sessionTimeout=1h

This timeout is pushed back every time the search process talks back to splunkd, which is why this issue will not occur for searches that match events frequently. However, if matching events come in more than 1 hour apart, the token will expire and the error will occur.

3 - Work-around

The work-around is to temporarily extend "sessionTimeout" in $SPLUNK_HOME/etc/system/local/server.conf to a value that will be longer than the interval between two matched events, thus preventing the token from expiring:


[general]
sessionTimeout = 30d

4 - Resolution

This issue is slated to be fixed in our next maintenance release: 6.1.2

View solution in original post

Path Finder
0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!