Alerting

Why are Splunk 7.x Monitoring Console alerts frequently reporting "DMC Alert - Search Peer Not Responding"?

kinaba_splunk
Splunk Employee
Splunk Employee

Splunk 7.x.x Monitoring Console Alerts are frequently reporting that one of our Indexers is "down" with a "DMC Alert - Search Peer Not Responding" alert. But I can see that the Splunk processes on this server and it has not been "restarted" in the past. It seems to be false positive.

Example)

scheduler.log:06-21-2018 04:31:01.957 +1000 INFO SavedSplunker - savedsearch_id="nobody;splunk_monitoring_console;DMC Alert - Search Peer Not Responding", search_type="scheduled", user="nobody", app="splunk_monitoring_console", savedsearch_name="DMC Alert - Search Peer Not Responding", priority=default, status=success, digest_mode=1, scheduled_time=1532268780, window_time=0, dispatch_time=1532268780, run_time=0.100, result_count=1, alert_actions="email", sid="scheduler_nobody_xxxx _RMDxxxx_at_xxxxx_xxxx", suppressed=0, thread_id="AlertNotifierWorker-0"

Could you tell me why?

0 Karma
1 Solution

kinaba_splunk
Splunk Employee
Splunk Employee

Basically, regarding this “DMC alert DMC Alert - Search Peer Not Responding”, DMC can check Peer’s status
every 5 minutes as below based on Endpoint status by rest “/services/search/distributed/peers”.

So, unfortunately, there might be chance that it might be triggered by not only peer down but also peer’s reply
not reached in timeout.
In general, when DMC alert is triggered despite peer is up, below SPL would be recommended to reduce it.

[DMC Alert - Search Peer Not Responding]
counttype = number of events
cron_schedule = 3,8,13,18,23,28,33,38,43,48,53,58 * * * *
description = One or more of your search peers is currently down.
quantity = 0
relation = greater than
search = | rest splunk_server=local /services/search/distributed/peers/ \
where status!="Up" AND disabled=0 |
fields peerName, status |
rename peerName as Instance, status as Status

Workaround:
There is 2 possible workarounds below.

(1) Set statusTimeout of distsearch.conf to be longer.
This timeout can be set longer. But the side effect would just be that it would take longer for search peers to
be considered down.
https://answers.splunk.com/answers/321592/dmc-alert-search-peer-not-responding-how-to-make-t.html

distsearch.conf: 
1. statusTimeout =  
2. * Set connection timeout when gathering a search peer's basic info (/services/server/info). 
3. * Note: Read/write timeouts are automatically set to twice this value. 
4. * Defaults to 10. 

(2) Replace SPL below with existing one.

| rest splunk_server=local /services/search/distributed/peers/ mode=extended
| search health_status != Healthy
| fields peerName, status, status_details, health_status
| rename peerName as Instance, status as "Latest Status", status_details as "Latest Status Details", health_status as "Overall Health (last 10 mins)"

Before setting, just check to see if the SPL works.
1) Go to DMC > Run a Search

| rest splunk_server=local /services/search/distributed/peers/ mode=extended
| search health_status != Healthy
| fields peerName, status, status_details, health_status
| rename peerName as Instance, status as "Latest Status", status_details as "Latest Status Details", health_status as "Overall Health (last 10 mins)"

2) See if result is shown
3) Go to /opt/splunk/etc/apps/splunk_monitoring_console/default
4) vi savedsearches.conf
5) copy part of [DMC Alert - Search Peer Not Responding]
6) Go to /opt/splunk/etc/system/local
7) vi savedsearches.conf and paste the above 3.
And change name like [DMC Alert - Search Peer Not Responding2]
8) Replace following “search = “ with below.

| rest splunk_server=local /services/search/distributed/peers/ mode=extended
| search health_status != Healthy
| fields peerName, status, status_details, health_status
| rename peerName as Instance, status as "Latest Status", status_details as "Latest Status Details", health_status as "Overall Health (last 10 mins)"

9) restart splunk
10) See if the above is shown on DMC > Settings > Alert Setup
11) Change the status to Enable

View solution in original post

0 Karma

kinaba_splunk
Splunk Employee
Splunk Employee

Basically, regarding this “DMC alert DMC Alert - Search Peer Not Responding”, DMC can check Peer’s status
every 5 minutes as below based on Endpoint status by rest “/services/search/distributed/peers”.

So, unfortunately, there might be chance that it might be triggered by not only peer down but also peer’s reply
not reached in timeout.
In general, when DMC alert is triggered despite peer is up, below SPL would be recommended to reduce it.

[DMC Alert - Search Peer Not Responding]
counttype = number of events
cron_schedule = 3,8,13,18,23,28,33,38,43,48,53,58 * * * *
description = One or more of your search peers is currently down.
quantity = 0
relation = greater than
search = | rest splunk_server=local /services/search/distributed/peers/ \
where status!="Up" AND disabled=0 |
fields peerName, status |
rename peerName as Instance, status as Status

Workaround:
There is 2 possible workarounds below.

(1) Set statusTimeout of distsearch.conf to be longer.
This timeout can be set longer. But the side effect would just be that it would take longer for search peers to
be considered down.
https://answers.splunk.com/answers/321592/dmc-alert-search-peer-not-responding-how-to-make-t.html

distsearch.conf: 
1. statusTimeout =  
2. * Set connection timeout when gathering a search peer's basic info (/services/server/info). 
3. * Note: Read/write timeouts are automatically set to twice this value. 
4. * Defaults to 10. 

(2) Replace SPL below with existing one.

| rest splunk_server=local /services/search/distributed/peers/ mode=extended
| search health_status != Healthy
| fields peerName, status, status_details, health_status
| rename peerName as Instance, status as "Latest Status", status_details as "Latest Status Details", health_status as "Overall Health (last 10 mins)"

Before setting, just check to see if the SPL works.
1) Go to DMC > Run a Search

| rest splunk_server=local /services/search/distributed/peers/ mode=extended
| search health_status != Healthy
| fields peerName, status, status_details, health_status
| rename peerName as Instance, status as "Latest Status", status_details as "Latest Status Details", health_status as "Overall Health (last 10 mins)"

2) See if result is shown
3) Go to /opt/splunk/etc/apps/splunk_monitoring_console/default
4) vi savedsearches.conf
5) copy part of [DMC Alert - Search Peer Not Responding]
6) Go to /opt/splunk/etc/system/local
7) vi savedsearches.conf and paste the above 3.
And change name like [DMC Alert - Search Peer Not Responding2]
8) Replace following “search = “ with below.

| rest splunk_server=local /services/search/distributed/peers/ mode=extended
| search health_status != Healthy
| fields peerName, status, status_details, health_status
| rename peerName as Instance, status as "Latest Status", status_details as "Latest Status Details", health_status as "Overall Health (last 10 mins)"

9) restart splunk
10) See if the above is shown on DMC > Settings > Alert Setup
11) Change the status to Enable

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...