Archive

how do I know if a search head restart did make my results incomplete?

Builder

My admin team frequently needs restart our search heads while I have a long running query still running. When this happens, my search page shows the following message

Reading error while waiting for peer . This can be caused by the peer unexpectedly closing or resetting the connection. Search results might be incomplete! If the problem persists, confirm network connectivity between this instance and the peer, and review search.log and splunkd.log on the peer to check its activity.

The message notes that "results might be incomplete". Is there a way to tell if they actually are incomplete?

0 Karma

Contributor

index=_internal sourcetype=scheduler OR sourcetype=splunkd user=youruser ... search for your search. Status should be completed or success

0 Karma

SplunkTrust
SplunkTrust

That sounds like a task where the Network Datamodel from the Common Information Model might help, once your admins have accelerated it.

...and yes, frequent restarts because high availability sounds like we're missing a good chunk of the picture.

0 Karma

SplunkTrust
SplunkTrust

That messages reads like an indexer (= search peer) was restarted, or at least connection was lost to it. If your search head was restarted, I'd expect the search job itself to stop.

That being said, I'd assume incorrect results.
The only scenario where the results would be complete is if that search peer was still searching for data when the connection was lost, but there was no more data to be found - unlikely.

I'd talk to the admin team about why they need restarts that frequently.
Additionally, many searches can be made to run faster through optimization. Feel free to ask a separate question with your search.

0 Karma

Builder

Thanks Martin, I think that I have to trust my admin team on the needed frequency of restarts. They say that it is related to high availability, which seems counter intuitive to me since high-availability should mean that my queries run. So either I have to become an admin to counter the point, or accept the reasoning

Good idea on a separate question for my query. I suspect that it will just run long since it is summarizing billions of network traffic events. But always worth an ask.

0 Karma