Solved: What can be done to alleviate the load on a resour...

ddrillic · ‎10-02-2018

We have a farm that is going to be retired in a couple of months.

The cluster master hasn't been doing well at all - Why is the indexer cluster master being marked as down consistently?

Support just told us -

-- The Cluster Master is desperately in need of additional resources, 2 cores and 8 GB of memory is not going to be sufficient.

Since there is no chance for us to get approval for additional resources on this VM, I wonder what can be done to alleviate the load on this cluster master?

dxu_splunk · ‎10-08-2018

2 core and 8gb is not going to cut it... but there are some configs we can try to tinker with (no promises):

indexers server.conf

heartbeat_period: 1->10
cxn_timeout = 60->300
send_timeout = 60->300
rcv_timeout = 60-> 300

CM server.conf

heartbeat_timeout = 60->300
max_fixup_time_ms = 5000
cxn_timeout = 60->300
send_timeout = 60->300
rcv_timeout = 60-> 300

View solution in original post

dxu_splunk · ‎10-08-2018

2 core and 8gb is not going to cut it... but there are some configs we can try to tinker with (no promises):

indexers server.conf

heartbeat_period: 1->10
cxn_timeout = 60->300
send_timeout = 60->300
rcv_timeout = 60-> 300

CM server.conf

heartbeat_timeout = 60->300
max_fixup_time_ms = 5000
cxn_timeout = 60->300
send_timeout = 60->300
rcv_timeout = 60-> 300

ddrillic · ‎10-16-2018

Much appreciated @dxu_splunk !!

sloshburch · ‎10-10-2018

My understanding of the issue is that the Cluster Master is having trouble coordinating your Search and Replication factors among the peers. So, even if you disable indexing _internal (which I promise you WILL regret doing that) you will eventually see this happen as bucket load increases with data volume.

Is your search factor and replication factor wildly high? Did you mess with the size of buckets? Both of those tuning could be causing your more issues.

At the end of the day, the software was designed for minimum specifications that are not being provided. If it helps sell your need for more power: a car can't really drive well on one wheel if it requires four.

ddrillic · ‎10-16-2018

Makes perfect sense @SloshBurch - thank you.

ddrillic · ‎10-03-2018

We found out that the indexers had issues to connect to the CM and therefore generated lots of internal data that the system couldn't easily index.

What can we do in such a case? Is there a way to disable _internal indexing in such cases?

ddrillic · ‎10-03-2018

For the sake of completeness from the CM -

$ grep CMPeer < splunkd.log.5.instability | wc -l
71999

It covers this time frame -
10-03-2018 01:11:11.213 -0500
10-03-2018 01:11:59.532 -0500

The messages look like -

10-03-2018 01:11:59.532 -0500 INFO  CMPeer - peer=12E6ED7C-9765-46F1-8883-5F34834E82F4 peer_name=<indexer> bid=<index name>~4485~3CA07398-A043-4E1E-BA20-233C66372471 transitioning from=Searchable to=SearchablePendingMask oldmask=0x4 newmask=0x5 reason="swap primaries"

What can be done to alleviate the load on a resource depleted cluster master?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes