Monitoring Splunk

Cluster members going into AutomaticDetention for only a few seconds. Why?

cpetterborg
SplunkTrust
SplunkTrust

My cluster has IDXs going in and out of detention very quickly (only a couple of seconds of detention). I see messages on the CM like the following:

08-20-2020 08:35:00.682 -0700 INFO  CMPeer - peer=XXXXXXXX-2E88-4DBF-8BC5-BA788B8D8083 peer_name=splunkidx7 transitioning from=Up to=AutomaticDetention reason="peer is blocked"

is then followed soon after by:

08-20-2020 08:35:03.197 -0700 INFO  CMPeer - peer=XXXXXXXX-2E88-4DBF-8BC5-BA788B8D8083 peer_name=splunkidx7 transitioning from=AutomaticDetention to=Up reason="heartbeat received."

As you can see it only is in detention for 2.5 seconds in this case. These have not been happening until just this last week. I've restarted the CM. The status for the CM is all green and it is running fine AFAIK. The disk space is more than fine on the IDXs, and they seem to be running fine. What is the criteria for an IDX being put into AutomaticDetention that I should look at? The second event above seems to point to a communication issue. Is there a config parameter that would give more time to the heartbeat (if that is the problem) that I can apply to make the auto detention not occur as frequently?

0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

Turns out the hot/warm partition was teetering on the edge on the servers. I had not seen that it was the problem. So you have to look at both the cold and hot/warm partitions to determine if it is a disk space requirement that is causing it to go into automatic detention.

View solution in original post

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Turns out the hot/warm partition was teetering on the edge on the servers. I had not seen that it was the problem. So you have to look at both the cold and hot/warm partitions to determine if it is a disk space requirement that is causing it to go into automatic detention.

0 Karma

Will_powr
Explorer

How did you resolve this issue?

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

The disk usage had to be tweaked so that the max data used on a partition wasn't being used. So setting the number of hot/warm buckets had to be tweaked to the right number. This has some bad side effects if you aren't careful. If you cut it too close to the max storage, then it will cause you to sometimes hit the point that you have IDXs go into auto detention. Then if you do a restart to the IDXs in the cluster, you can then get too many buckets going into cold, and if you don't have enough storage in cold, you get data being dropped off. So be careful to set your limits so that you don't end up with an auto detention problem.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...