Splunk Enterprise

Why are we receiving this ingestion latency error after updating to 8.2.1?

Marc_Williams
Explorer

So we just updated to 8.2.1 and we are now getting an Ingestion Latency error…

How do we correct it? Here is what the link says and then we have an option to view the last 50 messages...

 Ingestion Latency

  • Root Cause(s):
    • Events from tracker.log have not been seen for the last 6529 seconds, which is more than the red threshold (210 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.
    • Events from tracker.log are delayed for 9658 seconds, which is more than the red threshold (180 seconds). This typically occurs when indexing or forwarding are falling behind or are blocked.
  • Generate Diag?If filing a support case, click here to generate a diag.

Here are some examples of what is shown as the messages:

  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\spool\splunk.
  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\run\splunk\search_telemetry.
  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\watchdog.
  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\splunk.
  • 07-01-2021 09:28:52.276 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\var\log\introspection.
  • 07-01-2021 09:28:52.275 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\Splunk\etc\splunk.version.

07-01-2021 09:28:52.269 -0500 INFO TailingProcessor [66180 MainTailingThread] - Adding watch on path: C:\Program Files\CrushFTP9\CrushFTP.log.

  • 07-01-2021 09:28:52.268 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\watchdog\watchdog.log*.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk\splunk_instrumentation_cloud.log*.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk\license_usage_summary.log.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\splunk.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\var\log\introspection.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: monitor://$SPLUNK_HOME\etc\splunk.version.
  • 07-01-2021 09:28:52.267 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\tracker.log*.
  • 07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\...stash_new.
  • 07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk\...stash_hec.
  • 07-01-2021 09:28:52.266 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\spool\splunk.
  • 07-01-2021 09:28:52.265 -0500 INFO TailingProcessor [66180 MainTailingThread] - Parsing configuration stanza: batch://$SPLUNK_HOME\var\run\splunk\search_telemetry\*search_telemetry.json.
  • 07-01-2021 09:28:52.265 -0500 INFO TailingProcessor [66180 MainTailingThread] - TailWatcher initializing...
Labels (1)
Tags (2)

JeLangley
Engager

My apologies, we actually redeployed for a separate issue we were facing so I never did contact them on this.  

0 Karma

JeLangley
Engager

I am having this issue as well.  Would appreciate any information you've been able to dig up.

0 Karma

justynap_ldz
Path Finder

Hi Marc,

We are facing the same issue after 8.2.1 upgrade
Have you already found a solution?

Greetings,
Justyna

 

0 Karma

Marc_Williams
Explorer

No....I have not found a solution. However it appears to have cleared itself.

0 Karma

Marc_Williams
Explorer

So we thought we had it resolved. However it is back again.

We restart the services and we can watch it go from good to bad.

Anyone else had luck finding an answer?

0 Karma

yukiang
Observer

me too looking for a solution to address this ingestion latency....

0 Karma

PeteAve
Engager

We had this problem after upgrading to v8.2.3 and have found a solution.

After disabling the SplunkUniversal Forwarder, the SplunkLightForwarder and the SplunkForwarder on splunkdev01, the system returned to normal operation. These apps were enabled on the Indexer and should have been disabled by default. Also when trying to load a UniversalForwarder that is not compatible to v8.2.3, it will cause ingestion latency and tailreader errors. We had some Solaris 5.1 servers (forwarders) that are no longer compatible with upgrades so we just kept them on 8.0.5. The upgrade requires Solaris 11 or more.

The first thing I did was go to the web interface, Manage Apps and searched *forward*.

This showed the three Forwarders that I needed to disable and I disabled them on the interface.

I also  typed these commands in unix on the indexer:

splunk disable app SplunkForwarder -auth <username>:<password>
splunk disable app SplunkLight -auth <username>:<password>
splunk disable app SplunkUniversalForwarder -auth <username>:<password>

After doing these things the ingestion latency and tailreader errors stopped.

phil__tanner
Path Finder

FWIW, we just upgraded from 8.1.3 to 8.2.5 tonight, and are facing exactly these same issues.

Only difference is that these forwarder apps are already disabled on our instance.

Is there any update from Splunk support on this issue?

0 Karma

dpalmer235
Observer

We upgraded from 8.7.1 to 8.2.6 and we have the same tracker.log latency issue.

Please help us SPLUNK...

0 Karma

andrew_burnett
Path Finder

Commenting on this to be notified of the solution.

0 Karma

phil__tanner
Path Finder

FWIW, my support case is still open. I still have no answers. Although I have many support people telling me the problem doesn't exist, so I reply with screenshots of the problem still existing.

The original resolution suggested was to disable the monitoring/alerting for this service. If anyone is interested in this solution, I'm happy to post it - but as it doesn't solve the underlying issue, and all it does is stop the alert telling you the issue exists, I haven't bothered testing/implementing it myself.

 

0 Karma

phil__tanner
Path Finder

Splunk support have replied and confirmed (finally) that it is a known bug for the ingestion latency for both on-prem and cloud customers.

Their suggested solution is to disable the monitoring/disable the alerts. 

Note - this doesn't fix the ingestion issues (that are causing indexing to be skipped & therefore data loss) - only stops warning you about the issue.

0 Karma

tgarcias21sec
Observer

Hi, in our case also the same thing happens, it seems to be monitoring the tracker.log file that does not exist in any of our deployed hosts, yet we have this problem in the SH and the Indexer.

0 Karma

bablucho
Path Finder

same problem here, only appeared immediately after 9.0 upgrade

0 Karma

hgtsecurity
New Member

In our situation, the problem was actually the permissions on this one particular log file.  It appears that when Splunk was upgraded, the permission on the log file was set to root only and splunk was not able to read the log file.  We don't run Splunk as a root user, therefore we had no other choice but to change ownership of the file so Splunk could read it.  We are running RHEL 8.x, so "chown -R splunk:splunk /opt/splunk" did the trick.  Once we restarted Splunk the issue went away immediately.  

Just like several others had mentioned previously, we were only seeing the issue on our Cluster Master and no other Splunk application server.  Hope this helps!

0 Karma

Marc_Williams
Explorer

We are running on a windows platform. We have made no changes to the environment (permission or user).

We have upgraded to 9.0 and still have the issue.

 

0 Karma

bablucho
Path Finder

FOUND A FIX!... or workaround rather and hopefully it works for you all.

i've been working tirelessly with a splunk senior technical support til midnight for the past 2 days in an effort to fault find and fix this problem. Support seem to think it is a scaling issue as they suspect network latency and our 2 indexers being overwhelmed. 
This makes no sense to me as our environment is sufficiently scaled based on splunk validated architecture number of users and data ingestion amount. 

Anyways, i've spotted the issue classed as uncategorised under 9.0 known issues. Was only logged 2 weeks ago and i'm surprised (or not really) that support failed to pick this up rather take me on a wild goose chase of fault finding

bablucho_0-1657812136599.png
turn off the useACK setting
useACK = false
on any outputs.conf file you can locate on the affected instances then restart for changes to take effect. This should stop the tracker.log errors and data should continuously flow through again

 



 

 

0 Karma

Jarohnimo
Builder

So what's the verdict? Is the workaround is working?

0 Karma

bablucho
Path Finder

apologies for the delay....



Yes, all operating as expected post workaround

0 Karma

Jarohnimo
Builder

Glad it works for you, unfortunately it made no change in my environment. 

0 Karma

Jarohnimo
Builder

I'm raising my eyebrow at this being a true work around (but certainly hoping that it is). Where I do agree that queues get blocked on the forwarding nodes the verbage is a bit vague. 

I've been fighting this issue for weeks and have been on the exact page looking for the issue and didn't find it. I was searching for "ingestion latency" not blocked queues. Block queue is a broad category and usually a performance reason why it's blocked. 

If you make it 3 or 4 days without the issue popping back up I'd say this workaround is solid.

Anyway fingers crossed

0 Karma
Get Updates on the Splunk Community!

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...

Getting Started with AIOps: Event Correlation Basics and Alert Storm Detection in ...

Getting Started with AIOps:Event Correlation Basics and Alert Storm Detection in Splunk IT Service ...

Register to Attend BSides SPL 2022 - It's all Happening October 18!

Join like-minded individuals for technical sessions on everything Splunk!  This is a community-led and run ...