Splunk Search

Monitor search peers

Explorer

I have 2 servers, Splunk1 and Splunk2, setup as search peers. How can I monitor when one of the servers goes down or stops responding using Splunk? I have received messages like the following:

-- Search generated the following messages --
Message Level: ERROR
1. Reading error while waiting for peer SPLUNK2. Search results might be incomplete!

I would like to be alerted when something like this happens. Does anyone have any ideas?

Tags (2)
0 Karma

Path Finder

Back in Version 3, on the main search screen, you would see a not "x of y" servers . For example, "5 of 5" Servers. If one was not responding, you could pull down a tab and immediately see which one.

This was a good idea, and meant your users would immediately see any issue. I would like to suggest seeing it come back.

0 Karma

Splunk Employee
Splunk Employee

Here are 2 methods to detect if search peer is down, or hasn't responded to a search.

  • Schedule a search and count the number of peer responding

Pick a search that should always return results, and count the number of search-peers,
Then setup an email alert based on the number of search-peers (including the search head)

Schedule the search every 5 minutes over last 2hours, and use the alert condition :
if number of events is less than X

index=_internal splunk_server=* | stats count by splunk_server

  • Schedule a search looking in the logs for errors

This is to detect an failure in a search afterward.
By example schedule this search to run every 5 minutes over the last 5 minutes.

index=_internal source=*splunkd.log "Unable to connect to peer"

One remark, a search peer may not respond because of long searches that are hitting the timeout settings, you can increase them if its the case.
see : connectionTimeout, sendTimeout, receiveTimeout in distsearch.conf
http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf