Hello all,
In my case, we are facing a problem with missing entries in our splunk environment. We have UF installed on an automotive related station. Every vehicle that passes is an entry that is being logged in a .json file by an app. Then this .json file is ingested by the UF ( we use a simple monitor of .json file: [monitor://C:\app_name\json_files\our.json]
sourcetype = ***
index = ***
disabled = false )
but we see entries that are present in the .json and missing in splunk. The location that the UF is have serious networking issues for the last couple of months and I have seen many logs related to connection UF-Indexer being broken and such.
EXAMPLE:
02-05-2026 08:58:09.204 +0200 INFO TailReader [11496 tailreader0] - Batch input finished reading file='C:\Program Files\SplunkUniversalForwarder\var\spool\splunk\tracker.log'
02-05-2026 08:58:14.305 +0200 ERROR TcpOutputFd [10872 TcpOutEloop] - Read error. An existing connection was forcibly closed by the remote host.
02-05-2026 08:58:14.305 +0200 INFO AutoLoadBalancedConnectionStrategy [10872 TcpOutEloop] - Connection to *.*.*.*:9997 connid 0 closed. Stale connection=0. Read error. An existing connection was forcibly closed by the remote host.
02-05-2026 08:58:16.166 +0200 WARN HttpClientRequest [11200 HttpClientPollingThread_0C25404B-8B5C-4761-A855-F90FA37ED216] - Returning error HTTP/1.1 502 Error connecting: Winsock error 10060
02-05-2026 08:58:16.166 +0200 WARN HttpPubSubConnection [11200 HttpClientPollingThread_0C25404B-8B5C-4761-A855-F90FA37ED216] - Unable to parse message from PubSubSvr:
02-05-2026 08:58:16.166 +0200 WARN HttpPubSubConnection [11200 HttpClientPollingThread_0C25404B-8B5C-4761-A855-F90FA37ED216] - Batch subscribe aborted as status is not eOk
02-05-2026 08:58:26.500 +0200 WARN AutoLoadBalancedConnectionStrategy [10872 TcpOutEloop] - Cooked connection to ip=*.*.*.*:9997 timed out
02-05-2026 08:58:39.564 +0200 INFO TailReader [11496 tailreader0] - Batch input finished reading file='C:\Program Files\SplunkUniversalForwarder\var\spool\splunk\tracker.log'
We obviously have a network issues causing problems but why is splunk not able to get the data upon re-establishing connectivity?
Hello. The client fixed their internet connectivity and since no data lost. They refused any solution on our side after it was made clear that the internet connectivity was the root cause. It sucks because I couldn't test a real solution from here
There are usually two possible scenarios when data loss occurs in UF communication.
1. The receiver gets the event (or chunk in case of cooked data) but fails to properly process it for some reason whereas the forwarder considers the data already sent and therefore processed. The useAck option is designed to prevent this scenario. It can cause duplicates though.
2. The forwarder fails to send buffered events and then it is shut down or crashes. Upon restart the buffer contents are lost and not retried (the data had already been read from the source file so Splunk resumes reading from after what had been read). Against that you can protect with persistent queues. Persistent queues can incure performance penalty since data is not only held in memory but is written onto disk.
Thank you for the reply. We don't want duplicates in the data and we will go with persistent queues most likely. Currently they are trying to fix the internet issues on site and after that I can work on the uf
Those two mechanisms are meant to solve two different problems.
And unless the protocol is extremely complicated it is very hard to not have either lost data or duplicates.
I understand that but my company sold the product as a fool proof solution that catches up and never loses data. I really believed that this is the case before we hit this error.
Well, as I said - you can have protection against data loss (as everything - it has its limitation, nothing in this world is 100% safe, secure, fool-proof and so on) but there is a possibility of data duplication. As simple as that. And you can't really do anything about it.
Hi @radko
One thing you might like to investigate is setting up a persistentQueue in the outputs.conf tcpout group - this is available in > Splunk 9.4.0
For more information on this check out https://community.splunk.com/t5/Knowledge-Management/New-Splunk-TcpOutput-persistent-queue/m-p/70734... this might help with the outages if queues are filling up (for example).
Im wondering what else could be affecting it - do the files rotate?
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Thank you very much. We will go with your proposal. Hopefully (I believe) it hits the target! Will update your reply as solution once I get it done.