All Apps and Add-ons

How can I view the indexing latency for incoming events in real time?

hexx
Splunk Employee
Splunk Employee

I would like to be able to view the indexing latency of my incoming events in real time. How can this be done?

1 Solution

hexx
Splunk Employee
Splunk Employee

The theoretical indexing latency can be calculated by subtracting the extracted time stamp (_time) from the time at which the event was indexed (_indextime). As both _time and _indextime are expressed in seconds from the Epoch, this subtraction will yield a latency in seconds.

A pertinent search to run in order to evaluate indexing latency computes the average, minimum and maximum indexing latency broken down by relevant field such as host, source, sourcetype or index.

Example :

* | eval time=_time | eval itime=_indextime | eval latency=(itime  - time) | stats count, avg(latency), min(latency), max(latency) by source

This search is pretty expensive so it should probably be limited to small event sets.

I find that it is usually best to run this search with the "real time(all-time)" setting of the time picker that shows the live stream of events being indexed regardless of their time stamp (sort of a "tail -f" on the event set you restrict with your search terms).

Note that this assumes that the files/events are presented to Splunk in a timely fashion. This figure will of course be inaccurate when Splunk indexes historic logs. Negative latencies, if observed, usually indicate a system clock difference between the log writers and the indexer.

With a wise use of the split-by clauses, one can make interesting discoveries and track down the root cause of indexing latency.

For example, a disparity in the latency observed when splitting by source for events coming from a given forwarder typically indicates that splunkd is reading slowly for certain sources but is on time for others. This rules out an indexer-side issue and rather points out to a slow-reading input such as file monitor input reading from 100,000's of files on an NFS file system with high network latency and inadequate cache settings.

Finally, note that it should be rather easy to modify this search to only expose events with abnormal latencies (let's say higher than 1 minute or more than 10 seconds in the future) for alerting purposes. Of course, this should be run on a data source that is configured to bring in only current data :

sourcetype=cisco | eval time=_time | eval itime=_indextime | eval latency=(itime - time) | search latency>60 OR latency<-10 | stats count, avg(latency), min(latency), max(latency) by host

View solution in original post

leonjxtan
Path Finder

does "itime=_indextime" mean the time when log is generated? or mean when the log caught Splunk Forwarder's attention?

0 Karma

leonjxtan
Path Finder

sorry to add. I have this question b/c I see the "latency=(itime - time)" sometimes negative numbers

0 Karma

srinivasup
Explorer

may be timestamp

bandit
Motivator

Negative latency typically would be improper timezone/timestamp recognition as @srinivasup noted or possibly an invalid system clock time on a host. i.e. if a host clock is running 5 seconds slow, you could see -5 second latency.

0 Karma

rshoward
Path Finder
0 Karma

hexx
Splunk Employee
Splunk Employee

The theoretical indexing latency can be calculated by subtracting the extracted time stamp (_time) from the time at which the event was indexed (_indextime). As both _time and _indextime are expressed in seconds from the Epoch, this subtraction will yield a latency in seconds.

A pertinent search to run in order to evaluate indexing latency computes the average, minimum and maximum indexing latency broken down by relevant field such as host, source, sourcetype or index.

Example :

* | eval time=_time | eval itime=_indextime | eval latency=(itime  - time) | stats count, avg(latency), min(latency), max(latency) by source

This search is pretty expensive so it should probably be limited to small event sets.

I find that it is usually best to run this search with the "real time(all-time)" setting of the time picker that shows the live stream of events being indexed regardless of their time stamp (sort of a "tail -f" on the event set you restrict with your search terms).

Note that this assumes that the files/events are presented to Splunk in a timely fashion. This figure will of course be inaccurate when Splunk indexes historic logs. Negative latencies, if observed, usually indicate a system clock difference between the log writers and the indexer.

With a wise use of the split-by clauses, one can make interesting discoveries and track down the root cause of indexing latency.

For example, a disparity in the latency observed when splitting by source for events coming from a given forwarder typically indicates that splunkd is reading slowly for certain sources but is on time for others. This rules out an indexer-side issue and rather points out to a slow-reading input such as file monitor input reading from 100,000's of files on an NFS file system with high network latency and inadequate cache settings.

Finally, note that it should be rather easy to modify this search to only expose events with abnormal latencies (let's say higher than 1 minute or more than 10 seconds in the future) for alerting purposes. Of course, this should be run on a data source that is configured to bring in only current data :

sourcetype=cisco | eval time=_time | eval itime=_indextime | eval latency=(itime - time) | search latency>60 OR latency<-10 | stats count, avg(latency), min(latency), max(latency) by host

hexx
Splunk Employee
Splunk Employee

The indexing latency for incoming events is exposed using real-time search in the "Distributed Indexing Performance" view of the Splunk on Splunk (S.o.S) app.

Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...