Splunk Search

Monitor search peers

jec013
Explorer

I have 2 servers, Splunk1 and Splunk2, setup as search peers. How can I monitor when one of the servers goes down or stops responding using Splunk? I have received messages like the following:

-- Search generated the following messages --
Message Level: ERROR
1. Reading error while waiting for peer SPLUNK2. Search results might be incomplete!

I would like to be alerted when something like this happens. Does anyone have any ideas?

Tags (2)
0 Karma

JimDeich
Path Finder

Back in Version 3, on the main search screen, you would see a not "x of y" servers . For example, "5 of 5" Servers. If one was not responding, you could pull down a tab and immediately see which one.

This was a good idea, and meant your users would immediately see any issue. I would like to suggest seeing it come back.

0 Karma

yannK
Splunk Employee
Splunk Employee

Here are 2 methods to detect if search peer is down, or hasn't responded to a search.

  • Schedule a search and count the number of peer responding

Pick a search that should always return results, and count the number of search-peers,
Then setup an email alert based on the number of search-peers (including the search head)

Schedule the search every 5 minutes over last 2hours, and use the alert condition :
if number of events is less than X

index=_internal splunk_server=* | stats count by splunk_server

  • Schedule a search looking in the logs for errors

This is to detect an failure in a search afterward.
By example schedule this search to run every 5 minutes over the last 5 minutes.

index=_internal source=*splunkd.log "Unable to connect to peer"

One remark, a search peer may not respond because of long searches that are hitting the timeout settings, you can increase them if its the case.
see : connectionTimeout, sendTimeout, receiveTimeout in distsearch.conf
http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf

Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...