Running Splunk 5.0.5 on RHEL 5.11.
One of our Splunk servers recently had several messages appear in dmesg like this:
ip_conntrack: table full, dropping packet
A bit of research led to the net.ipv4.ip_conntrack_max setting, which we doubled to eliminate the messages while we investigated. Found a large number of "UNREPLIED" tcp entries in /proc/net/ip_conntrack which had long time to live (up to five days), and had dport=9997, so were destined for the indexer. These entries were the vast majority of the ip_conntrack table entries.
We looked at the rest of our Splunk servers, and found one other where the ip_conntrack table was nearly full, but most of the indexers' ip_conntrack tables were closer to 10% of capacity, most of which were UNREPLIED entries for dport=9997.
We're trying to find out what is causing these UNREPLIED entries, but have so far been unsuccessful. We've also found conflicting information on the web as to whether "dropping packet" means that a real IP packet was dropped by iptables, or merely that an ip_conntrack table entry was dropped to make room for a NEW packet.
Has anyone else seen this, or have suggestions about the cause and fix?
Hi, I'm seeing the same issues on some rhel5 boxes that have iptables enabled.
Did you ever find a solution to this?
The bandaids I've seen suggested elsewhere are to increase net.ipv4.netfilter.ip_conntrack_max.