Splunk Search

Saved search failing with gen_id mistmatch

Path Finder

Hi There,

Recently one of our saved searches have been failing intermittently with the error below, the search is set to trigger an alert when the return count is 0. As it is failing this is the return count and so we are getting false alarms. We have a simple cluster setup with 1 X search head, 1X master, and 2 X indexers. I have checked the time on all three servers and they are synced with the NTP server fine.
The below is returned in the search:

ERROR: [indexer2] Search results may be incomplete, peer searchhead1 search ended prematurely. Error = Peer indexer2 will not return any results for this search, because the search head is using an outdated generation (search head genid=35; peer genid=36). This can be caused by the peer re-registering and the search head not yet updating to the latest generation. This should resolve itself shortly.
ERROR: [indexer1] Search results may be incomplete, peer searchhead1 search ended prematurely. Error = Peer [indexer1] will not return any results for this search, because the search head is using an outdated generation (search head genid=35; peer genid=37). This can be caused by the peer re-registering and the search head not yet updating to the latest generation. This should resolve itself shortly.

Tags (1)

Path Finder

Harsmarvania57,

Thanks for your feedback, looking at the log as you suggested I can see two distinct types of heartbeat message:
the first is:
-0400 WARN DistributedPeerManagerHeartbeat - Failed to get list of indexes from peer https://n.n.n.n:8089 (there are a lot of these all for the same indexer

The second is:
-0400 INFO CMPeer - peer=GUID peername=primaryindexer transitioning from=Up to=Down reason="heartbeat or restart timeout=60" (this one appears once for each indexer at least once a day at different times.)

Which of these were you seeing when troubleshooting your issue?

0 Karma

SplunkTrust
SplunkTrust

This is clearly showing that Cluster Master is not able to contact with Indexers in 60 seconds

The second is:
-0400 INFO CMPeer - peer=GUID peername=primaryindexer transitioning from=Up to=Down reason="heartbeat or restart timeout=60" (this one appears once for each indexer at least once a day at different times.)

If you want to find out root cause run ping from Cluster Master to Indexers and from Indexers to Cluster Master for 24 hours and if there are packet drops then you have network problem.

0 Karma

Path Finder

Thanks,

The problem is now where to start looking for the cause, all our nodes are on the same VLAN.

0 Karma

SplunkTrust
SplunkTrust

Run ping from Cluster Master to Indexers and from Indexers to Cluster Master for 24 hours and if there are packet drops then you have network problem.

0 Karma

Engager

Hi All ,

I got the same issue . if i tried to ping from master node to indexer1 or indexer2 it is pinging but if tried from indexer1 or indexer2 to master node getting error like """ unknown host".

could you please let me know the reason how to resolve this issue.

0 Karma

SplunkTrust
SplunkTrust

This error occur when Cluster Master is not able to talk to Indexer within 60 seconds and due to that Heartbeat will fail and Cluster Master thinks that Indexer is down. But in practical indexer is not down. This issue mostly occur due to network problem.

You can search "heartbeat" in splunkd.log on Cluster Master and you can see error.

We faced same issue recently and it was due to packet drop between Cluster Master and Indexer. As our Cluster Master and Indexers were on different VLAN. After that we added one more interface in Cluster Master with same VLAN which is present on Indexer. After that everything works fine.

0 Karma