I have a slightly atypical environment - an indexer cluster with two search head clusters talking to it.
On one of the indexers, while looking for something completely different, I found that I have this in logs:
ERROR ClusterSlaveControlHandler - Peer <redacted> will not return any results
for this search, because the search head is using an outdated generation (search
head gen_id=25238; peer gen_id=25303). This can be caused by the peer re-registering
and the search head not yet updating to the latest generation. This should resolve
itself shortly.
Unfortunately, it didn't "resolve itself shortly". Furthermore, it persists across 75% of my indexers.
I searched for this across the Answers boards, but there's not much info about it.
I'm not sure where to even start debugging this. That's the main problem I think.
To make matters more interesting - Splunk doesn't seem to return bad results or throw errors at me while searching (I do have some "hangs" on search but that's probably due to some yet unresolved storage latency problem).
Any ideas where to look from here?
When we faced this issue, it was because of the connectivity issue between the CM and the Indexers. CM had issues talking to the indexers and vice versa. I'm sure you must have looked into it already, but if not then its definitely worth taking a look at packets being dropped between the affected indexers and CM. It can be anything from bad network connectivity to indexers being overloaded.
If you're absolutely sure that there's no connectivity issues and Indexers are not over utilized, the please try the following steps:
1. Put CM on maintenance mode.
2. Log in to an indexer and stop the Splunk service.
3. Start the Splunk service and wait for the indexer to join the cluster.
4. Repeat for all the indexers one at a time and ensure they all join the cluster.
5. Disable the maintenance mode.
This can take some time, depending upon the network speed and connectivity between your indexers and CM, but should take care of all the issues.
Hope this helps,
###If it helps, kindly consider upvoting/marking as accepted answer###
Yeah, I suppose rejoining the cluster would probably solve it. I'm scheduling an upgrade so I'll be restarting all indexers anyway. 🙂
Was thinking about where did the problem come from in the first place. There might have been some interminnent communication issues (but connected more with the general splunkd "freeze" due to storage problems than network problems). But as the message says - I'd expect the problem to correct itself after a while.
Well, we'll see.
Thanks for hints.