To monitor if my nightly searches ran properly I'm looking at:
index=_internal sourcetype=scheduler earliest=@d | <few_more_filtering>
but I've just noticed that in case of a receiveTimeout error for one of the involved peers, the "status" field in the resulting events contains the value "success", even if opening the search results from the job list I can see an error:
Timed out waiting for peer XXX. If this occurs frequently, receiveTimeout in distsearch.conf may need to be increased. Search results might be incomplete!
I tried to run a global search like:
splunk_server=* index=* "Timed out waiting for peer"
But nothing is popping up.
Is there a way to set up an alert in case a search ran, but failed or had any issues? The "status" field doesn't seem to cover the latter scenario...
This error occurs when your Search Heads attempts to send a search job to a Search Peer (usually one of your Indexers) and the Indexer does not respond in within the default timeout period so the Search continues but without using that Indexer (which of course probably means that some of your events are not returned so your search is wrong). In my experience, the problem can often be cleared simply by restarting the Splunk instance on the Indexer in question but sometimes you need to dig deeper. In any case, something is keeping your Indexers so busy that it cannot reliably respond to search requests even though the Splunk instance is running. I am sure this kind of thing can also commonly be caused by misconfigured/misbehaving load-balancers or other identity/load-shifting equipment that is between your Search Head and your Indexer peers.
As a workaround I'm now checking the messages.error field from the API (i.e. /services/search/jobs)... those messages are available there.
I still think the status field from the scheduler events log should be set to something different than success if actually something happened 😉
index=*
will not give you results from the _internal index. Try:
index=_internal splunk_server=* "Timed out waiting for peer"
Yeah I forgot to say I've already tried with index=_* too but nothing there neither.