This is less of a question and more of a record on Splunk Answers of an issue we ran into.
Symptoms:
You are on Red Hat 6.6, 7.0, or 7.1
The Indexer stops receiving data from Forwarders, but looks to be up and running fine otherwise. In a sense it seems stuck for our TCP input port on splunkd. For example, the Indexer still participates as a peer for any searches, and the Indexer also looks to continue indexing any data generated locally - internal Splunk logs or scripted inputs running on that Indexer. Essentially the TCP input port is bad, but the splunkd admin port is fine.
When running the following search, any Indexers having this problem would show up has having 1 or 0 Distinct Hosts since they weren't able to receive anything from Forwarders:
index=_internal sourcetype=splunkd earliest=-15m | stats count as event_count, dc(host) as distinct_hosts by splunk_server
Also, we would notice a lot of TCP connections sitting in a CLOSE_WAIT or SYN_RECV state when running netstat -an on the Indexer.
Workaround:
Use pstack to dump a list of processes associated with splunkd. This will magically fix things without having to restart the Indexer:
pstack `head -1 $SPLUNK_HOME/var/run/splunk/splunkd.pid`
Solution:
There is a bug in Red Hat that causes things to go awry:
https://access.redhat.com/solutions/1386323
In our case, the solution was to patch from kernel 2.6.32-504.8.1 to 2.6.32-504.16.2
Environment Details:
- Splunk: 6.2.5 (build 272645)
- OS: Red Hat Linux 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST 2014 x86_64 x86_64 x86_64 GNU/Linux
- HW: Cisco UCS C240
Solution:
There is a bug in Red Hat that causes things to go awry:
https://access.redhat.com/solutions/1386323
In our case, the solution was to patch from kernel 2.6.32-504.8.1 to 2.6.32-504.16.2
Solution:
There is a bug in Red Hat that causes things to go awry:
https://access.redhat.com/solutions/1386323
In our case, the solution was to patch from kernel 2.6.32-504.8.1 to 2.6.32-504.16.2
nice work!