Getting Data In

How to troubleshoot why a universal forwarder lost data when forwarding to an indexer?

ccie24806
New Member

I deploy a universal forwarder on SUSE Linux server, and monitor a log file. This forwarder forwards data to an indexer. We found that sometimes we can't search some logs which were added to the log file on the Linux server. For example, we added one log which contains the key word YWG_704740 to the log file, and then we do searching on the indexer like this index=XXXX host=XXXX YWG_704740, time range is all time, but we can't search anything.

I enable indexer acknowledgment on the forwarder, set the useACK attribute to true in outputs.conf. It is effective, but we still can't search some logs on the indexer, but they were more less than before.

I want to know, do we have some methods to find what happened? For example, the connection problem or the forwarder problem or indexer problem.
Thanks a lot!

0 Karma
1 Solution

nnmiller
Contributor

To clarify what bmacias84 said, on the forwarder, check splunkd.log and metrics.log.

Other places to look:

  • Search index=_internal and look for errors relating to the forwarder by IP address/hostname.

  • Are the search results all coming from the same source files? Are the missing events from just one or two source files on the forwarder? If so, check log file permissions on the forwarder. (But don't run the Splunk forwarder as root, that is a security issue.)

  • Do you have anything set up to route data to different indices? If so, double check that this input is not going to the wrong index.

View solution in original post

nnmiller
Contributor

To clarify what bmacias84 said, on the forwarder, check splunkd.log and metrics.log.

Other places to look:

  • Search index=_internal and look for errors relating to the forwarder by IP address/hostname.

  • Are the search results all coming from the same source files? Are the missing events from just one or two source files on the forwarder? If so, check log file permissions on the forwarder. (But don't run the Splunk forwarder as root, that is a security issue.)

  • Do you have anything set up to route data to different indices? If so, double check that this input is not going to the wrong index.

ccie24806
New Member

Great Thanks!
We do some checking and troubleshooting, but there are still some problems. Please see the checking process below.
1. The search results are all coming from the same source.
2. The missing events are from one source file.
3. I think the log file permissions are OK, because we can receive most of the events in this log file.
4. We didn't set up to route data to different indices.
5. We checked splunkd.log and metrics.log.
6. We found there were a lot of error events about connection failed in splunkd.log.
7. We didn't find any error or warn event in metrics.log.
8. We enable indexer acknowledgment on the forwarder, and set the useACK attribute to true in outputs.conf.
9. It is effective, but we still can't receive all events, but there were more less than before. (Only lost less than 10 events per day after enabling indexer acknowledgment on the forwarder. If we don't enable indexer acknowledgment, it will lost much more than 10 events per day.)

0 Karma

nnmiller
Contributor

Based on your trouble-shooting inside of Splunk ('connection failed'), I'd suggest:

  • Checking for network congestion
  • Checking for system performance issues (mainly on the receiving side, but potentially on the sending side): system resource exhaustion (CPU/memory/filesystem I/O) and/or TCP stack issues

Related to system performance: http://docs.splunk.com/Documentation/Splunk/6.3.0/ReleaseNotes/SplunkandTHP

Although this doesn't address the exact problem you are having, it may be helpful to see if there is an overall delay in indexing events: http://docs.splunk.com/Documentation/Splunk/6.3.0/Troubleshooting/Troubleshootingeventsindexingdelay

Fairly thorough discussion of system performance analysis wrt Splunk here: https://wiki.splunk.com/Community:PerformanceTroubleshooting

0 Karma

ccie24806
New Member

Thanks!
We will check our network first because we found a lot of packets were dropped.

0 Karma

bmacias84
Champion

To place to check are $SPLUNK_HOME/var/log/splunk/splunkd.log AND $SPLUNK_HOME/var/log/splunk/metrics.log. The splunk.log will contain information on where the forwarder is having problems connecting to the indexer. Metrics.log contains how many bytes are sent and whats happening in the queues.

Dumb question have you configured your forwarder to send to the indexer?

Also run $SPLUNK_HOME/bin/splunk list monitor and see if your time is listed.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...