My cluster has IDXs going in and out of detention very quickly (only a couple of seconds of detention). I see messages on the CM like the following:
08-20-2020 08:35:00.682 -0700 INFO CMPeer - peer=XXXXXXXX-2E88-4DBF-8BC5-BA788B8D8083 peer_name=splunkidx7 transitioning from=Up to=AutomaticDetention reason="peer is blocked"
is then followed soon after by:
08-20-2020 08:35:03.197 -0700 INFO CMPeer - peer=XXXXXXXX-2E88-4DBF-8BC5-BA788B8D8083 peer_name=splunkidx7 transitioning from=AutomaticDetention to=Up reason="heartbeat received."
As you can see it only is in detention for 2.5 seconds in this case. These have not been happening until just this last week. I've restarted the CM. The status for the CM is all green and it is running fine AFAIK. The disk space is more than fine on the IDXs, and they seem to be running fine. What is the criteria for an IDX being put into AutomaticDetention that I should look at? The second event above seems to point to a communication issue. Is there a config parameter that would give more time to the heartbeat (if that is the problem) that I can apply to make the auto detention not occur as frequently?
Turns out the hot/warm partition was teetering on the edge on the servers. I had not seen that it was the problem. So you have to look at both the cold and hot/warm partitions to determine if it is a disk space requirement that is causing it to go into automatic detention.
Turns out the hot/warm partition was teetering on the edge on the servers. I had not seen that it was the problem. So you have to look at both the cold and hot/warm partitions to determine if it is a disk space requirement that is causing it to go into automatic detention.
How did you resolve this issue?
The disk usage had to be tweaked so that the max data used on a partition wasn't being used. So setting the number of hot/warm buckets had to be tweaked to the right number. This has some bad side effects if you aren't careful. If you cut it too close to the max storage, then it will cause you to sometimes hit the point that you have IDXs go into auto detention. Then if you do a restart to the IDXs in the cluster, you can then get too many buckets going into cold, and if you don't have enough storage in cold, you get data being dropped off. So be careful to set your limits so that you don't end up with an auto detention problem.