Deployment Architecture

What can be done to alleviate the load on a resource depleted cluster master?

Ultra Champion

We have a farm that is going to be retired in a couple of months.

The cluster master hasn't been doing well at all - Why is the indexer cluster master being marked as down consistently?

Support just told us -

-- The Cluster Master is desperately in need of additional resources, 2 cores and 8 GB of memory is not going to be sufficient.

Since there is no chance for us to get approval for additional resources on this VM, I wonder what can be done to alleviate the load on this cluster master?

0 Karma
1 Solution

Splunk Employee
Splunk Employee

2 core and 8gb is not going to cut it... but there are some configs we can try to tinker with (no promises):

indexers server.conf

heartbeat_period: 1->10
cxn_timeout = 60->300
send_timeout = 60->300
rcv_timeout = 60-> 300

CM server.conf

heartbeat_timeout = 60->300
max_fixup_time_ms = 5000​
cxn_timeout = 60->300
send_timeout = 60->300
rcv_timeout = 60-> 300

View solution in original post

Splunk Employee
Splunk Employee

2 core and 8gb is not going to cut it... but there are some configs we can try to tinker with (no promises):

indexers server.conf

heartbeat_period: 1->10
cxn_timeout = 60->300
send_timeout = 60->300
rcv_timeout = 60-> 300

CM server.conf

heartbeat_timeout = 60->300
max_fixup_time_ms = 5000​
cxn_timeout = 60->300
send_timeout = 60->300
rcv_timeout = 60-> 300

View solution in original post

Ultra Champion

Much appreciated @dxu_splunk !!

0 Karma

Ultra Champion

My understanding of the issue is that the Cluster Master is having trouble coordinating your Search and Replication factors among the peers. So, even if you disable indexing _internal (which I promise you WILL regret doing that) you will eventually see this happen as bucket load increases with data volume.

Is your search factor and replication factor wildly high? Did you mess with the size of buckets? Both of those tuning could be causing your more issues.

At the end of the day, the software was designed for minimum specifications that are not being provided. If it helps sell your need for more power: a car can't really drive well on one wheel if it requires four.

Ultra Champion

Makes perfect sense @SloshBurch - thank you.

0 Karma

Ultra Champion

We found out that the indexers had issues to connect to the CM and therefore generated lots of internal data that the system couldn't easily index.

alt text

What can we do in such a case? Is there a way to disable _internal indexing in such cases?

0 Karma

Ultra Champion

For the sake of completeness from the CM -

$ grep CMPeer < splunkd.log.5.instability | wc -l
71999 

It covers this time frame -
10-03-2018 01:11:11.213 -0500
10-03-2018 01:11:59.532 -0500

The messages look like -

10-03-2018 01:11:59.532 -0500 INFO  CMPeer - peer=12E6ED7C-9765-46F1-8883-5F34834E82F4 peer_name=<indexer> bid=<index name>~4485~3CA07398-A043-4E1E-BA20-233C66372471 transitioning from=Searchable to=SearchablePendingMask oldmask=0x4 newmask=0x5 reason="swap primaries"
0 Karma