- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Log inconsistantly lagging behind
Hello,
I have a case where the logs from 4 host are lagging behind. Why I say inconsistant is the laggig is differ from 5 to 30 minutes, sometime didn't at all. When the log don't show up 30 minutes or more, I go to the forwarder management and disable/enable apps, restart Splunkd, then the log continue with 1, 2 seconds lag.
The other host also lagging behind at peak hour, but only for 1 or 2 minutes (maximum 5' for source with large amount of logs).
I admit that our indexer cluster is not up to par in IOPS requirement but for 4 paticular host to be visible underperform is quite concerning.
Can someone show me steps to debug and solve the problems.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Are these UFs? Did you change the default thruput limit?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, they're UFs. I already set
[thruput]
maxKBps = 0
in limits.conf in the app.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) Share which OS version, which UF version, and roughly how many inputs on those hosts
2) Search _internal for your hostname(IP) for error codes
2.1) Is the UF generating errors
2.2) Does the UF get indexing paused/congested reports back from the IDX tier.
2.3) Does the UF show round robin to all IDX elements or is there a discrepancy in outputs.conf?
Lets start with these.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After some investigation, the answer is:
1) The OS is Linux Redhat 8, Splunk UF version 9.1.1, we have 2 deployment of Splunk which is Splunk Enterprise and Splunk Security, on my end (Splunk Enterprise) there are only 2 inputs but on the Security end, there are a lot, with 2 apps HG_TA_Splunk_Nix and TA_nmon (roughly 40 inputs each) over 4 hosts.
2.1) There are some but not noteworthy ERROR. The errors are below:
+700 ERROR TcpoutputQ [11073 TcpOutEloop] - Unexpected event id=<eventid> -> benign ERROR as per Splunk dev
+700 ERROR ExecProcessor [32056 ExecProcessor] - message from "$SPLUNKHOME/HG_TA_Splunk_Nix/bin/update.sh" https://repo.napas.local/centos/7/updates/x84_64/repodata/repomd.xml: [Errorno14] curl#7 - "Failed to connect to repo.napas.local:80; No route to host"
2.2) HealthReporter show
+700 INFO PeriodHealthReporter - feature="Ingestion latency" color=red/yellow indicator="ingestion_latency_gap_multiplier" due_to_threshold_value=1 measured_value=26684 reason=Events from tracker.log have not been seen for the last 26684 seconds, which is more than the red threshold ( 210 seconds ). This typically occurs when indexing or forwarding are falling behind or are blocked." node_type=indicator node_path=splunkd.file_monitor_input.ingestion_latency.ingestion_latency_gap_multiplier.
2.3) log _internal |stats count by destIP show
idx1: 14248
idx2: 8014
idx3: 7963
idx4: 7809
Which is more concerning than I thought it would be.
2.4) Another find. The log is now lagging 1 hour behind, and still being pulled/ingest. But the internal log had stop, the time now is 9:08, but the last internal log is 8:19, with no error, which is
+700 Metrics - group=thruput, name=uncooked_output, instantaneous_kbps=0.000, instantaneous_eps=0.000, average_kbps=0.000, total_k_processed=0.000, kb=0.000, ev=0, interval_sec=60
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here is an excellent conf presentation, how to find the reason for this lag https://conf.splunk.com/files/2019/slides/FN1570.pdf
