Archive

ERROR TcpInputProc - Error encountered for connection from timeout

Communicator

Hi there,

since we rolled out a couple of houndred forwarder, we do have connection errors.

If I do a telnet from a forwarder (unix), sometimes I get an answer, sometimes I doesn't. If it works, we get events.

On the indexer I can recognize this error event

ERROR TcpInputProc - Error encountered for connection from ... timeout

I have a lot of them...
The forwarders and indexer are in the same subnet. We already installed a new one to verifiy if we have an issue in our configuration. With a new indexer we have the same issue.

On the forwarder side we have the following warn message

TcpOutputProc - Raw connection to ip ... :9997 timed out

Does anyone have had the same issue?

thanks in advance

Regards.

0 Karma
1 Solution

Communicator

this error is caused by the heartbeat function. every 30 seconds the heartbeat will send to indexer. if the indexer don't get it during that time, the indexer writes a log with the timeout message. network devices like a firewall can causing this or long remote connections. I disabled the heartbeat. Other solution could be change the time frequency from 30 seconds...

View solution in original post

New Member

Are all of your SE servers using NTP and do you have the correct DNS records loaded? Timing and authentication can cause issues on you Splunk infrastructure.

0 Karma

Communicator

this error is caused by the heartbeat function. every 30 seconds the heartbeat will send to indexer. if the indexer don't get it during that time, the indexer writes a log with the timeout message. network devices like a firewall can causing this or long remote connections. I disabled the heartbeat. Other solution could be change the time frequency from 30 seconds...

View solution in original post

Ultra Champion

It could be that you are overloading your network a/o indexer.

Did the problem always exist, or did it start occurring once you reached a certain number of forwarders sending data?

Have you installed the Deployment Monitor app? It ships with splunk by default, you just need to enable it. This can give you some insights into congestion problems.

Please tell us more of your HW/SW configuration (OS, version of splunk etc etc)


UPDATE:

Does the error occur for a particular type of forwarder?
Are your ulimit and other OS settings (forwarder and indexer) the same as for the other (functioning) landscape?
Are there intermediate network components that might be causing trouble (switches, routers, firewalls)?
Does the problem go away when you have lower loads (e.g. at night)?

/K

0 Karma

Explorer

Hi there, the error seems to have disappeared when I moved from a Universal Forwarder configuration to a Light Forwarder.

I was not able to get data into my indexers, but I'm not sure if this error had anything to do with it.

The error was appearing with as few as 4 hosts, so I don't think it's related to a network load issue.

0 Karma

Communicator

Just for your information: at the moment it seems like an normal behaivor. We think that this "error messages" don't influences the Splunk indexing behaivor.

0 Karma

Communicator

@JasonCzerak: Did you find the solution or any hints for that?

0 Karma

Explorer

I have the same problem. The forwarders are on the same subnet as the intermediate forwarder. With just as little as 10 connections to it would error out.

0 Karma

Champion

Also, to reply to this thread all you need to do is to click "comment on this answer" below this message, saves me converting your answers to comments 😉

0 Karma

Ultra Champion

There are several tools for this, depending on your OS, but common ones include WireShark or tcpdump. /k

Communicator

thanks. How does it works, the packet caputure on an indexer?

0 Karma

Champion

This sounds alot like a firewalling or a stateful issue. Do you have a firewall between the machines? I've seen this previously where a firewall has decided that either it doesn't allow a tcp connection or it times one out too quickly or decides it has been open too long. Perhaps it would be useful to do a packet capture on the indexer?

0 Karma

Communicator

Thanks for response.

Yes, if we have less connections, we don't have trouble. It just came up with more forwarder connections.

We installed a new Indexer with different hardware (other switchports, other layer3 components).

We don't have this issue an all forwarder, just a couple of houndreds which are located in different subnets.

It is definitly an issue with the three way handshake (doens't complete successfull). Means the TCP connection between Forwarder <-> Indexer work properly. All firewall logs are checked, no noticeable events.

We opened an Splunk support ticket today.

0 Karma

Ultra Champion

update with further questions above. /k

0 Karma

Communicator

Thanks for your response. The indexer doesn't have the status "overloaded".
Befor we rolled out the new forwarders (~1000), we had a couple of houndreds without this errors.
All queues are fine.
We already checked S.o.S and Deployment Monitor without any helpfull message. The only message I got is what I pasted before.

The indexer is a powerfull Quadcore machine with 16 GB of RAM. The indexes are located on a Netapp. The Splunk version is 4.3.1 and also the forwarders.

We already tried the same scenario with 4.3., same behaivor.

At the moment the network team is checking all points.

We do have exactly the same configurations in other landscapes (HW/SW) without any problems. And in other landscapes we have a lot more forwarders.

0 Karma