I deploy a universal forwarder on SUSE Linux server, and monitor a log file. This forwarder forwards data to an indexer. We found that sometimes we can't search some logs which were added to the log file on the Linux server. For example, we added one log which contains the key word YWG_704740 to the log file, and then we do searching on the indexer like this
index=XXXX host=XXXX YWG_704740, time range is all time, but we can't search anything.
I enable indexer acknowledgment on the forwarder, set the useACK attribute to true in outputs.conf. It is effective, but we still can't search some logs on the indexer, but they were more less than before.
I want to know, do we have some methods to find what happened? For example, the connection problem or the forwarder problem or indexer problem.
Thanks a lot!
To place to check are $SPLUNK_HOME/var/log/splunk/splunkd.log AND $SPLUNK_HOME/var/log/splunk/metrics.log. The splunk.log will contain information on where the forwarder is having problems connecting to the indexer. Metrics.log contains how many bytes are sent and whats happening in the queues.
Dumb question have you configured your forwarder to send to the indexer?
Also run $SPLUNK_HOME/bin/splunk list monitor and see if your time is listed.
To clarify what bmacias84 said, on the forwarder, check splunkd.log and metrics.log.
Other places to look:
Search index=_internal and look for errors relating to the forwarder by IP address/hostname.
Are the search results all coming from the same source files? Are the missing events from just one or two source files on the forwarder? If so, check log file permissions on the forwarder. (But don't run the Splunk forwarder as root, that is a security issue.)
Do you have anything set up to route data to different indices? If so, double check that this input is not going to the wrong index.
We do some checking and troubleshooting, but there are still some problems. Please see the checking process below.
1. The search results are all coming from the same source.
2. The missing events are from one source file.
3. I think the log file permissions are OK, because we can receive most of the events in this log file.
4. We didn't set up to route data to different indices.
5. We checked splunkd.log and metrics.log.
6. We found there were a lot of error events about connection failed in splunkd.log.
7. We didn't find any error or warn event in metrics.log.
8. We enable indexer acknowledgment on the forwarder, and set the useACK attribute to true in outputs.conf.
9. It is effective, but we still can't receive all events, but there were more less than before. (Only lost less than 10 events per day after enabling indexer acknowledgment on the forwarder. If we don't enable indexer acknowledgment, it will lost much more than 10 events per day.)
Based on your trouble-shooting inside of Splunk ('connection failed'), I'd suggest:
Related to system performance: http://docs.splunk.com/Documentation/Splunk/6.3.0/ReleaseNotes/SplunkandTHP
Although this doesn't address the exact problem you are having, it may be helpful to see if there is an overall delay in indexing events: http://docs.splunk.com/Documentation/Splunk/6.3.0/Troubleshooting/Troubleshootingeventsindexingdelay
Fairly thorough discussion of system performance analysis wrt Splunk here: https://wiki.splunk.com/Community:PerformanceTroubleshooting
We will check our network first because we found a lot of packets were dropped.