Right after upgrading to 6.1, I noticed that some scheduled real-time searches fail to send emails or trigger any otherwise configured alert actions after they have been running for a while.
Alerts are sent initially, but after a few hours, even though events come in that match the search, there are no alerts triggered.
Is this a known bug?
This is bug SPL-84357 and is specific to 6.1 and 6.1.1.
There are two known signatures to this bug:
In splunkd_access.log, we can see entries recorded for the failing alerting searches that show splunkd denying access (401) for a POST to the .../saved/searches/{search name}/notify?trigger.condition_state=1
to a local requester (client IP = 127.0.0.1), which actually is the search process:
127.0.0.1 - - [20/May/2014:12:43:16.856 -0700] "POST /servicesNS/admin/search/saved/searches/test%20per-result%20alerting/notify?trigger.condition_state=1 HTTP/1.0" 401 148 - - - 0ms
In the affected search's search.log, we can see a corresponding client-side message, reporting the authorization denial from splunkd to create an alert item:
05-20-2014 12:43:16.857 WARN SearchStateListener - Search listener notification returned non 2XX status code, status_code=401; Success. Removing fake artifacts, sid=rt_scheduler_adminsearch_RMD58061d21d6537baaa_at_1400533557_0.12
The root cause is that the process of the real-time alerting search is using an authentication token to communicate back to splunkd which is inappropriately subjected to the splunkd session timeout configured in server.conf, with a default value of 1 hour. From $SPLUNK_HOME/etc/system/default/server.conf
:
general
sessionTimeout=1h
This timeout is pushed back every time the search process talks back to splunkd, which is why this issue will not occur for searches that match events frequently. However, if matching events come in more than 1 hour apart, the token will expire and the error will occur.
The work-around is to temporarily extend "sessionTimeout" in $SPLUNK_HOME/etc/system/local/server.conf to a value that will be longer than the interval between two matched events, thus preventing the token from expiring:
[general]
sessionTimeout = 30d
This issue is slated to be fixed in our next maintenance release: 6.1.2
This is bug SPL-84357 and is specific to 6.1 and 6.1.1.
There are two known signatures to this bug:
In splunkd_access.log, we can see entries recorded for the failing alerting searches that show splunkd denying access (401) for a POST to the .../saved/searches/{search name}/notify?trigger.condition_state=1
to a local requester (client IP = 127.0.0.1), which actually is the search process:
127.0.0.1 - - [20/May/2014:12:43:16.856 -0700] "POST /servicesNS/admin/search/saved/searches/test%20per-result%20alerting/notify?trigger.condition_state=1 HTTP/1.0" 401 148 - - - 0ms
In the affected search's search.log, we can see a corresponding client-side message, reporting the authorization denial from splunkd to create an alert item:
05-20-2014 12:43:16.857 WARN SearchStateListener - Search listener notification returned non 2XX status code, status_code=401; Success. Removing fake artifacts, sid=rt_scheduler_adminsearch_RMD58061d21d6537baaa_at_1400533557_0.12
The root cause is that the process of the real-time alerting search is using an authentication token to communicate back to splunkd which is inappropriately subjected to the splunkd session timeout configured in server.conf, with a default value of 1 hour. From $SPLUNK_HOME/etc/system/default/server.conf
:
general
sessionTimeout=1h
This timeout is pushed back every time the search process talks back to splunkd, which is why this issue will not occur for searches that match events frequently. However, if matching events come in more than 1 hour apart, the token will expire and the error will occur.
The work-around is to temporarily extend "sessionTimeout" in $SPLUNK_HOME/etc/system/local/server.conf to a value that will be longer than the interval between two matched events, thus preventing the token from expiring:
[general]
sessionTimeout = 30d
This issue is slated to be fixed in our next maintenance release: 6.1.2
Unfortunately, it's still not fixed in 6.1.3.