We are experiencing random events dropping across multiple forwarders. We have a repro of the problem as we were doing some testing yesterday on one host and literally saw it drop about 20 events. Meaning these events never made it to Splunk. Events before and after these 20 events were indexed. There was no messages in splunkd on the forwarder indicating any sort of problems with 'dropped events'.
Also, these events are very similar in structure so it wouldnt be a time stamp issue.
Can anyone advise what is the best way to troubleshoot this problem? Or offer any advise as to what could be the issue?
You can probably search the internal splunk logs on the forwarder for warn or error log levels. And probably the same on the indexer. For example, looking for connection errors from the forwarder to the indexer.
But do you have acknowledgement turned on, so if the forwarder sends an event to the indexer it waits for an ack before clearing it from the queue?
Anything similar about the missing events that might differentiate them? Different log files? Different sourcetypes? etc?
You can probably search the internal splunk logs on the forwarder for warn or error log levels. And probably the same on the indexer. For example, looking for connection errors from the forwarder to the indexer.
But do you have acknowledgement turned on, so if the forwarder sends an event to the indexer it waits for an ack before clearing it from the queue?
Anything similar about the missing events that might differentiate them? Different log files? Different sourcetypes? etc?