Splunk Enterprise

Why are we receiving this ingestion latency error after updating to 8.2.1?

Marc_Williams
Explorer

So we just updated to 8.2.1 and we are now getting an Ingestion Latency error…

How do we correct it? Here is what the link says and then we have an option to view the last 50 messages...

 Ingestion Latency

  • Root Cause(s):
    • Events from tracker.log have not been seen for the last 6529 seconds, which is more than the red threshold (210 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.
    • Events from tracker.log are delayed for 9658 seconds, which is more than the red threshold (180 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.
  • Generate Diag?If filing a support case, click here to generate a diag.

Here are some examples of what is shown as the messages:

  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\spool\splunk.
  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\run\splunk\search_telemetry.
  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\watchdog.
  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\splunk.
  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\introspection.
  • 07-01-2021 09:28:52.275 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\etc\splunk.version.

07-01-2021 09:28:52.269 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\CrushFTP9\CrushFTP.log.

  • 07-01-2021 09:28:52.268 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\watchdog\watchdog.log*.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk\splunk_instrumentation_cloud.log*.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk\license_usage_summary.log.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\introspection.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\etc\splunk.version.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\tracker.log*.
  • 07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\...stash_new.
  • 07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\...stash_hec.
  • 07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk.
  • 07-01-2021 09:28:52.265 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\run\splunk\search_telemetry\*search_telemetry.json.
  • 07-01-2021 09:28:52.265 -0500 INFO TailingProcessor [66180 MainTailingThread] - TailWatcher initializing...
Labels (1)
Tags (2)

verbal_666
Contributor

I got this same bahaviour on my updated Splunk 8.2.6.


I encountered the problem was an inputs.conf with a stanza with

 

[monitor://$SPLUNK_HOME/var/run]

 

I really do not remember why i had this inputs, maybe in older 7.x.x i usedit for some "special" ingestion.

Removing the inputs in 8.2.6, removed the error in ui and tracker.log was regular in splunkd.log by default.

👨‍🔧

0 Karma

vuanhpham
Loves-to-Learn Lots

A bit late but posting since I haven't seen this info anywhere yet. I had a support case open for similar symptoms since going to 8.2.6. I had already taken extensive steps to rule out legitimate IO saturation and did not feel comfortable adjusting the threshold of the indicator because of potential false negatives. The tl;dr in my case was that it is a known issue that is fixed in the 9.0.1 release.

2022-07-14SPL-225807, SPL-219749Indicator 'ingestion_latency_gap_multiplier' exceeded configured value.

 

Being unsatisfied with the issue description not being precise enough, I kept probing the support engineer until I got sufficient explanation that it would be applicable. 

The way ingestion latency is detected is that tracker.log file gets generated on the server periodically in $SPLUNK_HOME/var/spool/splunk. It will contain a dummy event with a timestamp that is pulled from system now time. That dummy event is used to generate metrics that are used in the health indicator reports and are logged to internal indexes. This would be the most reliable way to detect indexing latency.  Apparently there was a bug in the code that calculates the latency that is documented to be fixed in the above issue. I watched and inspected the tracker.log files as they were being generated and quickly got bored, but never saw any timestamp that was inaccurate. So I'll take Splunk's word that the issue should be fixed in the latest release for now.

0 Karma

thangbui
Engager

My Splunk Enterprise version on cluster is 9.0.0.1 and I am also pacing this problem:

Ingestion Latency
Root Cause(s):
Events from tracker.log are delayed for 48517 seconds, which is more than the red threshold (180 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.

.....

Unhealthy Instances:
search-head-02

If anyone can solve this problem. Please help us!

0 Karma

youngec
Explorer

I think you would have to disable autoBatch or upgrade to 9.0.1 (not 9.0.0.1).

tfrds
New Member

More parallelIngestPipelines at the Indexer seem to help (but not fix), at least fewer messages appear now. Will be watching.

0 Karma

umangp
Loves-to-Learn Lots

I see the solution is to turn off useAck, but I can't do that with AWS ingestion, so this is not a good work around to this issue.

0 Karma

b_chris21
Communicator

Had the same issue, here is what fixed it for me:

Downgraded my Splunk HF from 9.0.1 to the same version with the UFs that send data to it. There seems to be a conflict with the version mismatch, even though according to Splunk there a backwards compatibility for UFs.

Downgrade was uninstalling v9.0.1, installing v8.2.5 and unzipping an old good backup of my v8.2.5 /etc folder.

That made the trick.

Hope it helps. If yes, a Karma would be appreciated 🙂

Christos

jdcabanglan
Loves-to-Learn Lots

I had encounter the same issue but as per checking my splunk version are both the same

0 Karma

youngec
Explorer

For those who ugpraded to v9.x, this may be applicable:

https://docs.splunk.com/Documentation/Forwarder/9.0.1/Forwarder/KnownIssues

2022-06-22SPL-226003 When forwarding from an 9.0 instance with useAck enabled, ingestion stops after some time with errors: "Invalid ACK received from indexer="

Workaround:
As a workaround, disable useAck in outputs.conf on the forwarder. After disabling, indexers start to ingest data.
If customers do need useACK to prevent data loss, disabling autoBatch in outputs.conf can remediate the issue too, but it impacts throughput - no worse than 8.x, but no improvement for 9.0.

 

0 Karma

matt8679
Path Finder

I had this issue too and noticed Splunk was falling behind when scanning large file before ingesting.

I ended up increasing the pipelines on the forwarders and the issue when away. Bumped to 3 where resources allowed.

[general]

parallelIngestionPipelines = 2

 

Also note, you will get this error if you have a source coming in with delayed logs. I think Splunk is alerting on this now so that is why you see the error with the updates. I still get this error on logs are are only coming in every couple of hours.

0 Karma

linhmai_bne
Explorer

I got similar issue after upgrading 8.2.7. I have tried to set:

useAck=false

disable app Splunk...Forwarders

chown -R splunk:splunk /opt/splunk

but the problem is still there.

0 Karma

tyates_ctm
Explorer

TL;DR: check `server` in `[tcpout:]` in `outputs.conf` of the server (not UFs)

I got this error after migrating onto bigger servers. The cause was the `server` attribute in the `[tcpout:]` stanza in `outputs.conf` on the various members of the cluster hadn't been updated. I have no idea why, but at some point over the past 5 years that same attribute on the UFs had been pointed at different DNS records, so the indexers were receiving the important data from across the estate.

Hope this helps someone.

0 Karma

Gregski11
Contributor

we have a case open on this as well, I will keep you posted on the resolution

we see stuff like this, and then they just mysteriously go away and a few days later they return, we are on version 9.0.0

  • Root Cause(s):
    • Events from tracker.log have not been seen for the last 1394 seconds, which is more than the red threshold (210 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.
0 Karma

Gregski11
Contributor

 we are getting the same error on our Cluster Master and it's running version 9.0.0

  • Root Cause(s):
    • Events from tracker.log are delayed for 44 seconds, which is more than the yellow threshold (15 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.

 

we also opened a support case with Splunk will keep you all up to date on how it unfolds

0 Karma

jdcabanglan
Loves-to-Learn Lots

Did you fix the issue?

0 Karma

Zacknoid
Explorer

Upgraded to version 9.0 facing similar issue : Root Cause(s) Indicator 'ingestion_latency_gap_multiplier' exceeded configured value. did you find out any solution for this ?? 

 

Thanks 

 

Gregski11
Contributor

@Zacknoid wrote:

Upgraded to version 9.0 facing similar issue : Root Cause(s) Indicator 'ingestion_latency_gap_multiplier' exceeded configured value. did you find out any solution for this ?? 

 

Thanks 

 


no but after a day or two the problem just went away 

0 Karma

Zacknoid
Explorer

Still looking for resolution, ingestion latency error 

0 Karma

sombhtr239
Explorer

Anyone having solution please help

0 Karma

sombhtr239
Explorer

I am also facing the same problem.  Server IOPS is 2000, still getting IOWAIT and ingesting latency error very frequently and then it goes away.

0 Karma
Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

The Splunk Success Framework: Your Guide to Successful Splunk Implementations

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...