Alerting

realtime alerting not happening

ignetops
Explorer

Running 4.2.3.
We are running sec in parallel. A few days ago, I had sec alert on a stack dump, but the rt search set to email didn't alert on it. I matched the event from sec with an event in splunk, so it was indexed.
What can be possible causes of splunk not alerting or 'finding' the even to alert on it.
If the docs are true, then the rt alert/searches should never ever miss a event trigger when matched. The rt searches are supposed to see the data as it streams in, before it hits the index.

ignetops
Explorer

Funny that you mention clock skew. I've been tracking down a clock skew issue on my servers. They are running ntpd, but I'm still having an issue. They are all running on AWS which are xen based VM's (including my splunk instance). I've been reading up on it: http://www.brookstevens.org/2010/06/xen-time-drift-and-ntp.html
What is your take on that?

jflomenberg: I executed your search, I'm not sure how to interpret the results. The drop_count column is empty, but the mean_preview_period has values from 0.00xxxxx to 11.011xxxxx (that large value is from today).

jflomenberg
Splunk Employee
Splunk Employee

I'm probably not the right person to comment on fixing clock skew.

0 drop count means no lost events.

A mean_preview_period that high could just be the indexer getting bogged down with scheduled jobs or lots of ad hoc searches. We evaluate less frequently if we think we're going to dop the ball on other stuff. It would be more curious if nothing of any significance was happening on the machine at the time that the large value was observed.

0 Karma

jflomenberg
Splunk Employee
Splunk Employee

There are at least 2 potential causes. The first is clock skew - timestamping as Takajian mentions - if you do a 1 minute rt window search on events on a machine where the timestamps are all 2 minutes behind the Splunk indexer then none of the events will fall within the evaluation window. The fixes for this are easy - make the window bigger, adjust the clocks, etc.

Another potential issue is if you write a very greedy search in terms of the first pass that we do on events before matching them to the full search query that fills the memory buffer to the point where we do have to drop events. You can see this in metrics.log where Drop_count indicates that an event has been pushed out the buffer and that an event of interest MAY have been missed. Here is an example

01-18-2011 05:03:43.856 -0800 INFO  Metrics - group=realtime_search_data, system total, drop_count=0, mean_preview_period=4.071817

and here is a search you can do to see if this is the case:

index=_internal group="realtime_search_data" | where sid NOT null | dedup sid | table _time, sid, drop_count, mean_preview_period

Takajian
Builder

I am not sure what is the root cause in your case. But there is possible for splunk to miss real time alert if timestamp of log is not exactly correct. Your splunk server system time and timestamp of index log should be exactly consist. In particular, if you use real time small window like second, splunk easily miss to alert if there is any time difference.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...