Why are we receiving this ingestion latency error ...

Marc_Williams · ‎07-07-2021

So we just updated to 8.2.1 and we are now getting an Ingestion Latency error…

How do we correct it? Here is what the link says and then we have an option to view the last 50 messages...

Ingestion Latency

Root Cause(s):
- Events from tracker.log have not been seen for the last 6529 seconds, which is more than the red threshold (210 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.
- Events from tracker.log are delayed for 9658 seconds, which is more than the red threshold (180 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.
Generate Diag?If filing a support case, click here to generate a diag.

Here are some examples of what is shown as the messages:

07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\spool\splunk.
07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\run\splunk\search_telemetry.
07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\watchdog.
07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\splunk.
07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\introspection.
07-01-2021 09:28:52.275 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\etc\splunk.version.

07-01-2021 09:28:52.269 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\CrushFTP9\CrushFTP.log.

07-01-2021 09:28:52.268 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\watchdog\watchdog.log*.
07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk\splunk_instrumentation_cloud.log*.
07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk\license_usage_summary.log.
07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk.
07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\introspection.
07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\etc\splunk.version.
07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\tracker.log*.
07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\...stash_new.
07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\...stash_hec.
07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk.
07-01-2021 09:28:52.265 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\run\splunk\search_telemetry\*search_telemetry.json.
07-01-2021 09:28:52.265 -0500 INFO TailingProcessor [66180 MainTailingThread] - TailWatcher initializing...

evinasco08 · ‎09-25-2023

Hi team, someone with de solution , i have updated to last version, 9.1.1, and gess what? it has the same error

Indicator 'ingestion_latency_gap_multiplier' exceeded configured value

Anyon that some solution?

jconradhulsey · ‎04-27-2023

I had this issue after updating from 8.2.1 to 8.2.10. It was found that there were two monitors for tracker.log so I disabled the one I found in $SPLUNK_HOME/etc/system/local/inputs.conf:

[monitor://$SPLUNK_HOME\var\spool\splunk\tracker.log*]
disabled = 1

After Splunk restarted, there were no issues. I hope this helps someone.

Zane · ‎02-27-2023

is there a solution to fix this? my version already upgrade to 9.0.3.
can i modify the limits.conf to limit thrput?

jswann_splunk · ‎02-06-2023

Hello All,

This is a known bug in Splunk and we are working to address. Please use the following work around in the interim.

Create a health.conf entry in /opt/splunk/etc/system/local on the affected machines being sure to restart splunk after the entry is made.

adding:

[health_reporter]

aggregate_ingestion_latency_health = 0

[feature:ingestion_latency]
alert.disabled = 1

disabled = 1

Let me know if you have any questions or concerns

jmartens · ‎08-03-2023

Still suffering the same in 9.10.2. Is the bug already fixed or do we still need to apply this fix to silence the message?

Apart from that does this disable ingestion latency warnings and do we need to monitor it another way?

Can you shed some light on this @jswann_splunk ?

cinsley · ‎06-05-2023

Good morning,

If this is a known bug than why is it not listed or addressed as a known issue under the latest release?

FFZ · ‎02-10-2023

Does this workaround is indicated to version 9.x?

verbal_666 · ‎11-20-2022

I got this same bahaviour on my updated Splunk 8.2.6.

I encountered the problem was an inputs.conf with a stanza with

[monitor://$SPLUNK_HOME/var/run]

I really do not remember why i had this inputs, maybe in older 7.x.x i usedit for some "special" ingestion.

Removing the inputs in 8.2.6, removed the error in ui and tracker.log was regular in splunkd.log by default.

👨‍🔧

vuanhpham · ‎10-27-2022

A bit late but posting since I haven't seen this info anywhere yet. I had a support case open for similar symptoms since going to 8.2.6. I had already taken extensive steps to rule out legitimate IO saturation and did not feel comfortable adjusting the threshold of the indicator because of potential false negatives. The tl;dr in my case was that it is a known issue that is fixed in the 9.0.1 release.

2022-07-14

SPL-225807, SPL-219749

Indicator 'ingestion_latency_gap_multiplier' exceeded configured value.

Being unsatisfied with the issue description not being precise enough, I kept probing the support engineer until I got sufficient explanation that it would be applicable.

The way ingestion latency is detected is that tracker.log file gets generated on the server periodically in $SPLUNK_HOME/var/spool/splunk. It will contain a dummy event with a timestamp that is pulled from system now time. That dummy event is used to generate metrics that are used in the health indicator reports and are logged to internal indexes. This would be the most reliable way to detect indexing latency. Apparently there was a bug in the code that calculates the latency that is documented to be fixed in the above issue. I watched and inspected the tracker.log files as they were being generated and quickly got bored, but never saw any timestamp that was inaccurate. So I'll take Splunk's word that the issue should be fixed in the latest release for now.

splunkthat · ‎10-27-2023

i am still recieving the same issues on 9.1.1 forwarder and splunk enterprise as of 1406 EST 10/27/2023. are you as well?

cinsley · ‎11-28-2023

Yes. My problem actually grew worse with higher latency numbers since 9.1.1

thangbui · ‎10-26-2022

My Splunk Enterprise version on cluster is 9.0.0.1 and I am also pacing this problem:

Ingestion Latency
Root Cause(s):
Events from tracker.log are delayed for 48517 seconds, which is more than the red threshold (180 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.

.....

Unhealthy Instances:
search-head-02

If anyone can solve this problem. Please help us!

youngec · ‎10-26-2022

I think you would have to disable autoBatch or upgrade to 9.0.1 (not 9.0.0.1).

tfrds · ‎09-29-2022

More parallelIngestPipelines at the Indexer seem to help (but not fix), at least fewer messages appear now. Will be watching.

umangp · ‎09-13-2022

I see the solution is to turn off useAck, but I can't do that with AWS ingestion, so this is not a good work around to this issue.

b_chris21 · ‎09-08-2022

Had the same issue, here is what fixed it for me:

Downgraded my Splunk HF from 9.0.1 to the same version with the UFs that send data to it. There seems to be a conflict with the version mismatch, even though according to Splunk there a backwards compatibility for UFs.

Downgrade was uninstalling v9.0.1, installing v8.2.5 and unzipping an old good backup of my v8.2.5 /etc folder.

That made the trick.

Hope it helps. If yes, a Karma would be appreciated 🙂

Christos

jdcabanglan · ‎10-19-2022

I had encounter the same issue but as per checking my splunk version are both the same

youngec · ‎08-19-2022

For those who ugpraded to v9.x, this may be applicable:

https://docs.splunk.com/Documentation/Forwarder/9.0.1/Forwarder/KnownIssues

2022-06-22SPL-226003 When forwarding from an 9.0 instance with useAck enabled, ingestion stops after some time with errors: "Invalid ACK received from indexer="

Workaround:
As a workaround, disable useAck in outputs.conf on the forwarder. After disabling, indexers start to ingest data.
If customers do need useACK to prevent data loss, disabling autoBatch in outputs.conf can remediate the issue too, but it impacts throughput - no worse than 8.x, but no improvement for 9.0.

matt8679 · ‎08-04-2022

I had this issue too and noticed Splunk was falling behind when scanning large file before ingesting.

I ended up increasing the pipelines on the forwarders and the issue when away. Bumped to 3 where resources allowed.

[general]

parallelIngestionPipelines = 2

Also note, you will get this error if you have a source coming in with delayed logs. I think Splunk is alerting on this now so that is why you see the error with the updates. I still get this error on logs are are only coming in every couple of hours.

verbal_666 · ‎11-28-2023

Whis is as you have 2 UF on same machine.

Maybe you should only increase the limits.conf,

[thruput]

maxKBps = <integer>
* The maximum speed, in kilobytes per second, that incoming data is
  processed through the thruput processor in the ingestion pipeline.
* To control the CPU load while indexing, use this setting to throttle
  the number of events this indexer processes to the rate (in
  kilobytes per second) that you specify.
* NOTE:
  * There is no guarantee that the thruput processor
    will always process less than the number of kilobytes per
    second that you specify with this setting. The status of
    earlier processing queues in the pipeline can cause
    temporary bursts of network activity that exceed what
    is configured in the setting.
  * The setting does not limit the amount of data that is
    written to the network from the tcpoutput processor, such
    as what happens when a universal forwarder sends data to
    an indexer.
  * The thruput processor applies the 'maxKBps' setting for each
    ingestion pipeline. If you configure multiple ingestion
    pipelines, the processor multiplies the 'maxKBps' value
    by the number of ingestion pipelines that you have
    configured.
  * For more information about multiple ingestion pipelines, see
    the 'parallelIngestionPipelines' setting in the
    server.conf.spec file.
* Default (Splunk Enterprise): 0 (unlimited)
* Default (Splunk Universal Forwarder): 256

Since by deault it send at 256Kb/s.

I set it to 2048 for many UFs which send much data.

You could also try a 0 to disable thruput control.

Why are we receiving this ingestion latency error after updating to 8.2.1?

troubleshooting

.conf24 | Day 0

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

Troubleshooting the OpenTelemetry Collector