One of our Splunk servers recently had several messages appear in dmesg like this:
ip_conntrack: table full, dropping packet
A bit of research led to the net.ipv4.ipconntrackmax setting, which we doubled to eliminate the messages while we investigated. Found a large number of "UNREPLIED" tcp entries in /proc/net/ipconntrack which had long time to live (up to five days), and had dport=9997, so were destined for the indexer. These entries were the vast majority of the ipconntrack table entries.
We looked at the rest of our Splunk servers, and found one other where the ipconntrack table was nearly full, but most of the indexers' ipconntrack tables were closer to 10% of capacity, most of which were UNREPLIED entries for dport=9997.
We're trying to find out what is causing these UNREPLIED entries, but have so far been unsuccessful. We've also found conflicting information on the web as to whether "dropping packet" means that a real IP packet was dropped by iptables, or merely that an ip_conntrack table entry was dropped to make room for a NEW packet.
Has anyone else seen this, or have suggestions about the cause and fix?