Getting Data In

Splunk not receiving data from forwarders (needs restart)

ruiaires
Path Finder

We have a Splunk server that is receiving data from more than 10 forwarders. It also receives data directly via UDP and monitors files on network shares.

We have scheduled searches to monitor and alert if a host stops sending data.

Ocasionally, Splunk reports all the forwarders have stopped sending data. Diagnosing we find that:
- Splunk is running ok and all UPD and local monitor files are working and receiving data
- All forwarders are up and running but Splunk is not indexing any data from them
- Restarting Splunk does not solve the issue
- Restaring the server solves the problem

We can't find any server logs that indicate a network problem.

Any ideias on how to diagnose this ?

We're running 4.3.3 on Windows Server 2008 64bit.

0 Karma
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

I would start by looking at splunkd.log on the forwarder in the $SPLUNK_HOME/var/log/splunk folder for messages that are from 'TcpOutputProc', they should give you an indication as to what is occurring when the forwarder tries to connect to the indexer.

View solution in original post

cult_hero13
Loves-to-Learn

I spent too much of a day trying to figure out why 2 of 5 servers were not showing up in my Indexer. I tried removing then adding the forward-server information, restarting the forwarder over and over and even reinstalling the forwarder on each, but they just didn't show up. Using telnet I confirmed the connection was open and the Indexer was listening. In the forwarders' splunkd.log files I confirmed the connection was being made. Finally I happened to change my search string to "index=_internal host=*" and there they were, but there was only one source from each and it was $SPLUNK_HOME/var/log/splunk/splunkd.log. The other 3 servers that were working had many more sources. A little bit more searching and I found this command:

$SPLUNK_HOME/bin/splunk list monitor

Sure enough, only splunkd.log was being forwarded. So I ran:

$SPLUNK_HOME/bin/splunk add monitor /var/log

This was what the "working" servers showed in their "list monitor" results. When running "host=*" in the search app, there were the 2 servers that wouldn't work before.

0 Karma

ruiaires
Path Finder

Our problems seem to be performance related since we are using a network storage for the index.

After a little help from Splunk Support we found a few messages indicating the queues were blocked probably because of network performance issues. The message is "Stopping all listening ports. Queues blocked for more than 300 seconds".

We are still testing everything to make sure that's the correct diagnose. If we find something I will post here !

0 Karma

smmehadi
Explorer

I too am facing a similar issue, does it got resolved for you (and how)?

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

I would start by looking at splunkd.log on the forwarder in the $SPLUNK_HOME/var/log/splunk folder for messages that are from 'TcpOutputProc', they should give you an indication as to what is occurring when the forwarder tries to connect to the indexer.

View solution in original post

ruiaires
Path Finder

Thanks.
We run a scheduled search to monitor if data is not being received from the forwarders and caputres the most recent time entry (when the data stopped)

I was able to see that, at that time, all the forwarders were reporting Connection Failed (on TcpOutputProc)

It seems to be a problem at the receiving index... The indexer is still running but TCP data is just not allowed to pass (although UDP and local monitoring works)

Maybe it's some kind of firewall issue, any ideia on how to diagnose this further ?

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!