Splunk Search

Why are we getting error "Timed out waiting for peer XXX", but the search status=success?

secfrit
Explorer

To monitor if my nightly searches ran properly I'm looking at:

index=_internal sourcetype=scheduler earliest=@d | <few_more_filtering>

but I've just noticed that in case of a receiveTimeout error for one of the involved peers, the "status" field in the resulting events contains the value "success", even if opening the search results from the job list I can see an error:

Timed out waiting for peer XXX. If this occurs frequently, receiveTimeout in distsearch.conf may need to be increased. Search results might be incomplete!

I tried to run a global search like:

 splunk_server=* index=* "Timed out waiting for peer"

But nothing is popping up.

Is there a way to set up an alert in case a search ran, but failed or had any issues? The "status" field doesn't seem to cover the latter scenario...

0 Karma

swmishra_splunk
Splunk Employee
Splunk Employee

This error occurs when your Search Heads attempts to send a search job to a Search Peer (usually one of your Indexers) and the Indexer does not respond in within the default timeout period so the Search continues but without using that Indexer (which of course probably means that some of your events are not returned so your search is wrong). In my experience, the problem can often be cleared simply by restarting the Splunk instance on the Indexer in question but sometimes you need to dig deeper. In any case, something is keeping your Indexers so busy that it cannot reliably respond to search requests even though the Splunk instance is running. I am sure this kind of thing can also commonly be caused by misconfigured/misbehaving load-balancers or other identity/load-shifting equipment that is between your Search Head and your Indexer peers.

secfrit
Explorer

As a workaround I'm now checking the messages.error field from the API (i.e. /services/search/jobs)... those messages are available there.

I still think the status field from the scheduler events log should be set to something different than success if actually something happened 😉

0 Karma

meenal901
Communicator

index=* will not give you results from the _internal index. Try:

index=_internal splunk_server=* "Timed out waiting for peer"
0 Karma

secfrit
Explorer

Yeah I forgot to say I've already tried with index=_* too but nothing there neither.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...