Deployment Architecture

ERROR TcpInputProc - Error encountered for connection from timeout

nebel
Communicator

Hi there,

since we rolled out a couple of houndred forwarder, we do have connection errors.

If I do a telnet from a forwarder (unix), sometimes I get an answer, sometimes I doesn't. If it works, we get events.

On the indexer I can recognize this error event

ERROR TcpInputProc - Error encountered for connection from ... timeout

I have a lot of them...
The forwarders and indexer are in the same subnet. We already installed a new one to verifiy if we have an issue in our configuration. With a new indexer we have the same issue.

On the forwarder side we have the following warn message

TcpOutputProc - Raw connection to ip ... :9997 timed out

Does anyone have had the same issue?

thanks in advance

Regards.

0 Karma
1 Solution

nebel
Communicator

this error is caused by the heartbeat function. every 30 seconds the heartbeat will send to indexer. if the indexer don't get it during that time, the indexer writes a log with the timeout message. network devices like a firewall can causing this or long remote connections. I disabled the heartbeat. Other solution could be change the time frequency from 30 seconds...

View solution in original post

tjrhodeback
New Member

Are all of your SE servers using NTP and do you have the correct DNS records loaded? Timing and authentication can cause issues on you Splunk infrastructure.

0 Karma

nebel
Communicator

this error is caused by the heartbeat function. every 30 seconds the heartbeat will send to indexer. if the indexer don't get it during that time, the indexer writes a log with the timeout message. network devices like a firewall can causing this or long remote connections. I disabled the heartbeat. Other solution could be change the time frequency from 30 seconds...

kristian_kolb
Ultra Champion

It could be that you are overloading your network a/o indexer.

Did the problem always exist, or did it start occurring once you reached a certain number of forwarders sending data?

Have you installed the Deployment Monitor app? It ships with splunk by default, you just need to enable it. This can give you some insights into congestion problems.

Please tell us more of your HW/SW configuration (OS, version of splunk etc etc)


UPDATE:

Does the error occur for a particular type of forwarder?
Are your ulimit and other OS settings (forwarder and indexer) the same as for the other (functioning) landscape?
Are there intermediate network components that might be causing trouble (switches, routers, firewalls)?
Does the problem go away when you have lower loads (e.g. at night)?

/K

0 Karma

itmonitoring
Explorer

Hi there, the error seems to have disappeared when I moved from a Universal Forwarder configuration to a Light Forwarder.

I was not able to get data into my indexers, but I'm not sure if this error had anything to do with it.

The error was appearing with as few as 4 hosts, so I don't think it's related to a network load issue.

0 Karma

nebel
Communicator

Just for your information: at the moment it seems like an normal behaivor. We think that this "error messages" don't influences the Splunk indexing behaivor.

0 Karma

nebel
Communicator

@JasonCzerak: Did you find the solution or any hints for that?

0 Karma

JasonCzerak
Explorer

I have the same problem. The forwarders are on the same subnet as the intermediate forwarder. With just as little as 10 connections to it would error out.

0 Karma

Drainy
Champion

Also, to reply to this thread all you need to do is to click "comment on this answer" below this message, saves me converting your answers to comments 😉

0 Karma

kristian_kolb
Ultra Champion

There are several tools for this, depending on your OS, but common ones include WireShark or tcpdump. /k

nebel
Communicator

thanks. How does it works, the packet caputure on an indexer?

0 Karma

Drainy
Champion

This sounds alot like a firewalling or a stateful issue. Do you have a firewall between the machines? I've seen this previously where a firewall has decided that either it doesn't allow a tcp connection or it times one out too quickly or decides it has been open too long. Perhaps it would be useful to do a packet capture on the indexer?

0 Karma

nebel
Communicator

Thanks for response.

Yes, if we have less connections, we don't have trouble. It just came up with more forwarder connections.

We installed a new Indexer with different hardware (other switchports, other layer3 components).

We don't have this issue an all forwarder, just a couple of houndreds which are located in different subnets.

It is definitly an issue with the three way handshake (doens't complete successfull). Means the TCP connection between Forwarder <-> Indexer work properly. All firewall logs are checked, no noticeable events.

We opened an Splunk support ticket today.

0 Karma

kristian_kolb
Ultra Champion

update with further questions above. /k

0 Karma

nebel
Communicator

Thanks for your response. The indexer doesn't have the status "overloaded".
Befor we rolled out the new forwarders (~1000), we had a couple of houndreds without this errors.
All queues are fine.
We already checked S.o.S and Deployment Monitor without any helpfull message. The only message I got is what I pasted before.

The indexer is a powerfull Quadcore machine with 16 GB of RAM. The indexes are located on a Netapp. The Splunk version is 4.3.1 and also the forwarders.

We already tried the same scenario with 4.3., same behaivor.

At the moment the network team is checking all points.

We do have exactly the same configurations in other landscapes (HW/SW) without any problems. And in other landscapes we have a lot more forwarders.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...