Solved: Cluster members going into AutomaticDetention for ...

cpetterborg · ‎08-20-2020

My cluster has IDXs going in and out of detention very quickly (only a couple of seconds of detention). I see messages on the CM like the following:

08-20-2020 08:35:00.682 -0700 INFO  CMPeer - peer=XXXXXXXX-2E88-4DBF-8BC5-BA788B8D8083 peer_name=splunkidx7 transitioning from=Up to=AutomaticDetention reason="peer is blocked"

is then followed soon after by:

08-20-2020 08:35:03.197 -0700 INFO  CMPeer - peer=XXXXXXXX-2E88-4DBF-8BC5-BA788B8D8083 peer_name=splunkidx7 transitioning from=AutomaticDetention to=Up reason="heartbeat received."

As you can see it only is in detention for 2.5 seconds in this case. These have not been happening until just this last week. I've restarted the CM. The status for the CM is all green and it is running fine AFAIK. The disk space is more than fine on the IDXs, and they seem to be running fine. What is the criteria for an IDX being put into AutomaticDetention that I should look at? The second event above seems to point to a communication issue. Is there a config parameter that would give more time to the heartbeat (if that is the problem) that I can apply to make the auto detention not occur as frequently?

cpetterborg · ‎08-26-2020

Turns out the hot/warm partition was teetering on the edge on the servers. I had not seen that it was the problem. So you have to look at both the cold and hot/warm partitions to determine if it is a disk space requirement that is causing it to go into automatic detention.

View solution in original post

cpetterborg · ‎08-26-2020

Turns out the hot/warm partition was teetering on the edge on the servers. I had not seen that it was the problem. So you have to look at both the cold and hot/warm partitions to determine if it is a disk space requirement that is causing it to go into automatic detention.

Will_powr · ‎02-19-2022

How did you resolve this issue?

cpetterborg · ‎02-19-2022

The disk usage had to be tweaked so that the max data used on a partition wasn't being used. So setting the number of hot/warm buckets had to be tweaked to the right number. This has some bad side effects if you aren't careful. If you cut it too close to the max storage, then it will cause you to sometimes hit the point that you have IDXs go into auto detention. Then if you do a restart to the IDXs in the cluster, you can then get too many buckets going into cold, and if you don't have enough storage in cold, you get data being dropped off. So be careful to set your limits so that you don't end up with an auto detention problem.

Cluster members going into AutomaticDetention for only a few seconds. Why?

indexer clustering

proactive Splunk component monitoring

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...