I would like to be able to view the indexing latency of my incoming events in real time. How can this be done?
The theoretical indexing latency can be calculated by subtracting the extracted time stamp (_time) from the time at which the event was indexed (_indextime). As both _time and _indextime are expressed in seconds from the Epoch, this subtraction will yield a latency in seconds.
A pertinent search to run in order to evaluate indexing latency computes the average, minimum and maximum indexing latency broken down by relevant field such as host, source, sourcetype or index.
Example :
* | eval time=_time | eval itime=_indextime | eval latency=(itime - time) | stats count, avg(latency), min(latency), max(latency) by source
This search is pretty expensive so it should probably be limited to small event sets.
I find that it is usually best to run this search with the "real time(all-time)" setting of the time picker that shows the live stream of events being indexed regardless of their time stamp (sort of a "tail -f" on the event set you restrict with your search terms).
Note that this assumes that the files/events are presented to Splunk in a timely fashion. This figure will of course be inaccurate when Splunk indexes historic logs. Negative latencies, if observed, usually indicate a system clock difference between the log writers and the indexer.
With a wise use of the split-by clauses, one can make interesting discoveries and track down the root cause of indexing latency.
For example, a disparity in the latency observed when splitting by source for events coming from a given forwarder typically indicates that splunkd is reading slowly for certain sources but is on time for others. This rules out an indexer-side issue and rather points out to a slow-reading input such as file monitor input reading from 100,000's of files on an NFS file system with high network latency and inadequate cache settings.
Finally, note that it should be rather easy to modify this search to only expose events with abnormal latencies (let's say higher than 1 minute or more than 10 seconds in the future) for alerting purposes. Of course, this should be run on a data source that is configured to bring in only current data :
sourcetype=cisco | eval time=_time | eval itime=_indextime | eval latency=(itime - time) | search latency>60 OR latency<-10 | stats count, avg(latency), min(latency), max(latency) by host
does "itime=_indextime" mean the time when log is generated? or mean when the log caught Splunk Forwarder's attention?
sorry to add. I have this question b/c I see the "latency=(itime - time)" sometimes negative numbers
may be timestamp
Negative latency typically would be improper timezone/timestamp recognition as @srinivasup noted or possibly an invalid system clock time on a host. i.e. if a host clock is running 5 seconds slow, you could see -5 second latency.
The theoretical indexing latency can be calculated by subtracting the extracted time stamp (_time) from the time at which the event was indexed (_indextime). As both _time and _indextime are expressed in seconds from the Epoch, this subtraction will yield a latency in seconds.
A pertinent search to run in order to evaluate indexing latency computes the average, minimum and maximum indexing latency broken down by relevant field such as host, source, sourcetype or index.
Example :
* | eval time=_time | eval itime=_indextime | eval latency=(itime - time) | stats count, avg(latency), min(latency), max(latency) by source
This search is pretty expensive so it should probably be limited to small event sets.
I find that it is usually best to run this search with the "real time(all-time)" setting of the time picker that shows the live stream of events being indexed regardless of their time stamp (sort of a "tail -f" on the event set you restrict with your search terms).
Note that this assumes that the files/events are presented to Splunk in a timely fashion. This figure will of course be inaccurate when Splunk indexes historic logs. Negative latencies, if observed, usually indicate a system clock difference between the log writers and the indexer.
With a wise use of the split-by clauses, one can make interesting discoveries and track down the root cause of indexing latency.
For example, a disparity in the latency observed when splitting by source for events coming from a given forwarder typically indicates that splunkd is reading slowly for certain sources but is on time for others. This rules out an indexer-side issue and rather points out to a slow-reading input such as file monitor input reading from 100,000's of files on an NFS file system with high network latency and inadequate cache settings.
Finally, note that it should be rather easy to modify this search to only expose events with abnormal latencies (let's say higher than 1 minute or more than 10 seconds in the future) for alerting purposes. Of course, this should be run on a data source that is configured to bring in only current data :
sourcetype=cisco | eval time=_time | eval itime=_indextime | eval latency=(itime - time) | search latency>60 OR latency<-10 | stats count, avg(latency), min(latency), max(latency) by host
The indexing latency for incoming events is exposed using real-time search in the "Distributed Indexing Performance" view of the Splunk on Splunk (S.o.S) app.