Hello All
While upgrading to version 6.6.2 (Indexer Cluster), I noticed that there is a new Status showing like "BatchAdding".
Though this is not much impacting anything, the Splunk upgrade was successful.
Any idea what this means?
Okay, from my experiences with dealing with this, this basically triggers when the communication between the CM and the IDXs drop off or are not in a sustainable way.
What actually happens is the CM thinks the IDX (Batch Adding one) cannot perform indexing operations hence it starts to move the IDX off the CM.
We all know that when an IDX drops off the CM, its primary buckets gets distributed to the other IDXs on site. By the time this process initiates/completes, the IDX gets back in a healthy communication position with the CM and hence the Batch Adding triggers.
What i did after this scenario was :
[clustering] [Cluster Master]
service_interval = 10
heartbeat_timeout = 1800
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
max_peer_build_load = 5
(using cluster-bundle deployment, no restart is required)
----server.conf-------- [Indexer]
On the Indexers:
cxn_timeout = 600
send_timeout = 600
rcv_timeout = 600
heartbeat_period = 10
Doing this reduced the instance from this occurring and this has stopped completely.
Okay, from my experiences with dealing with this, this basically triggers when the communication between the CM and the IDXs drop off or are not in a sustainable way.
What actually happens is the CM thinks the IDX (Batch Adding one) cannot perform indexing operations hence it starts to move the IDX off the CM.
We all know that when an IDX drops off the CM, its primary buckets gets distributed to the other IDXs on site. By the time this process initiates/completes, the IDX gets back in a healthy communication position with the CM and hence the Batch Adding triggers.
What i did after this scenario was :
[clustering] [Cluster Master]
service_interval = 10
heartbeat_timeout = 1800
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
max_peer_build_load = 5
(using cluster-bundle deployment, no restart is required)
----server.conf-------- [Indexer]
On the Indexers:
cxn_timeout = 600
send_timeout = 600
rcv_timeout = 600
heartbeat_period = 10
Doing this reduced the instance from this occurring and this has stopped completely.
Thanks for this! Helped us out in our very large environment
Do you also have a very large indexer environment? How many indexers are working with the 1 cluster master?
I'm assuming it's also a lot of GB/day of data?
This fixed our issue. We had 10 indexers which started a loop of BatchAdding over and over.
Also, see slide 32 out of 34 here:
https://conf.splunk.com/files/2017/slides/scaling-indexer-clustering-5-million-unique-buckets-and-be...
Yes, we have around 28 IDXs with 2.5 TB Data getting in everyday @garethatiag
I believe it was mentioned in one of the 2017 conf presentations, in particular https://conf.splunk.com/files/2017/slides/scaling-indexer-clustering-5-million-unique-buckets-and-be... however I cannot find the word "batch" in there.
BatchAdding is a state where the indexer is been added to the cluster, to improve performance of the cluster master (note this is from memory as I cannot find the details) and to / prevent the cluster master from hanging during adding an indexing peer with many buckets.
The slide "Peer adding - configurable amount of buckets" and the next few slides talk about the batch adding but I cannot find documentation on this which also surprised me...
i am seeing this as well
Failed to register with cluster master reason: failed method=POST path=/services/cluster/master/peers/?output_mode=json master=:8089 rv=0 gotConnectionError=1 gotUnexpectedStatusCode=0 actual_response_code=502 expected_response_code=2xx status_line="Read Timeout" socket_error="Read Timeout" remote_error= [ event=addPeer status=retrying AddPeerRequest: { _id= active_bundle_id=3A3876F4FD9A2DB7BBEDB12F45BDF49A add_type=ReAdd-As-Is base_generation_id=289956 batch_serialno=1 batch_size=17 forwarderdata_rcv_port=9997 forwarderdata_use_ssl=0 last_complete_generation_id=0 latest_bundle_id=3A3876F4FD9A2DB7BBEDB12F45BDF49A mgmt_port=8089 name=15A5C813-2915-4902-92D2-65C7095A9027 register_forwarder_address=10.18.193.51 register_replication_address=10.18.196.51 register_search_address=10.18.193.51 replication_port=9887 replication_use_ssl=0 replications= server_name=snx2splidxa23 site=site2 splunk_version=7.0.3 splunkd_build_number=fa31da744b51 status=Up } ].
we have around 120 indexers ..sometimes all are showing batch adding some times all are showing up
becuase of this results are not coming properly
and in the search head i am seeing the above error ..
please help
It almost sounds as if your cluster master is not keeping up with the current load of handling the many indexers, however that is a guess based on a log entry.
However I'd suggest that you would be better served by either a new question or perhaps in this case a support ticket!
hey .. thanks for help
i have raised the splunk case ..lets c what they suggest.
@splunk24
Please ensure that your CM is well spec. 120 Indexers are huge and they need to be consistenly managed by CM. Ensure the CM is heavily specced and is sustaining connections to/from IDXs.
Nothing comes up with Google for BatchAdding
- strange...
did you get the answer? i am also seeing the same