hello,
Our physical servers had to restart and as such the splunk servers dropped.
we are now having issues on our cluster master and our indexers.
our deployment looks like this,
DCAXXXG013 CM and LM
DCAXXXG014 IDX
DCAXXXG015 IDX
DCAXXXG016 IDX
DCAXXXG017 SH
DCPXXXG013 DS
DCPXXXG014 IDX
DCPXXXG015 IDX
DCPXXXG016 IDX
DCPXXXG017 SH
The indexers on site A and Site P are both clustered. just wondering if anyone can shed some light on where to go and how to progress from here if possible.
Search peer DCAOVSG016 has the following message: Failed to register with cluster master reason: failed method=POST path=/services/cluster/master/peers/?output_mode=json master=dcaovsg013:8089 rv=0 gotConnectionError=0 gotUnexpectedStatusCode=1 actual_response_code=500 expected_response_code=2xx status_line="Internal Server Error" socket_error="No error" remote_error=Cannot add peer=172.26.10.49 mgmtport=8089 (reason: non-zero pending job count=1, guid=ADA4AE8A-B93F-48E2-88CC-F47CDDCB9AE4). [ event=addPeer status=retrying AddPeerRequest: { _id= active_bundle_id=EDA5C78B2096F563800873D7CBD2A6DF add_type=ReAdd-As-Is base_generation_id=2073 batch_serialno=1 batch_size=3 forwarderdata_rcv_port=9197 forwarderdata_use_ssl=0 last_complete_generation_id=2077 latest_bundle_id=EDA5C78B2096F563800873D7CBD2A6DF mgmt_port=8089 name=ADA4AE8A-B93F-48E2-88CC-F47CDDCB9AE4 register_forwarder_address= register_replication_address= register_search_address= replication_port=9100 replication_use_ssl=0 replications= server_name=DCAOVSG016 site=site1 splunk_version=7.2.0 splunkd_build_number=8c86330ac18 status=Up } ].
Indexer Clustering: The search process with sid=rt_scheduler_admin_QkNOX1RBX1dpbmRvd3MtU2VydmVycw_RMD5d0958093cdddf4f3_at_1551270120_1818 on peer=DCAOVSG014 may have returned partial results due to a reading error while waiting for the peer. This can occur if the peer unexpectedly closes or resets the connection during a planned restart. Try running the search again. Learn more.
2/27/2019, 12:22:34 PM
Search peer DCAOVSG014 has the following message: Failed to register with cluster master reason: failed method=POST path=/services/cluster/master/peers/?output_mode=json master=dcaovsg013:8089 rv=0 gotConnectionError=0 gotUnexpectedStatusCode=1 actual_response_code=500 expected_response_code=2xx status_line="Internal Server Error" socket_error="No error" remote_error=Cannot add peer=172.26.10.47 mgmtport=8089 (reason: non-zero pending job count=2, guid=3724715E-6BAC-46F9-AFE7-06917EF3FD3C). [ event=addPeer status=retrying AddPeerRequest: { _id= active_bundle_id=EDA5C78B2096F563800873D7CBD2A6DF add_type=ReAdd-As-Is base_generation_id=2086 batch_serialno=1 batch_size=2 forwarderdata_rcv_port=9197 forwarderdata_use_ssl=0 last_complete_generation_id=2093 latest_bundle_id=EDA5C78B2096F563800873D7CBD2A6DF mgmt_port=8089 name=3724715E-6BAC-46F9-AFE7-06917EF3FD3C register_forwarder_address= register_replication_address= register_search_address= replication_port=9100 replication_use_ssl=0 replications= server_name=DCAOVSG014 site=site1 splunk_version=7.2.0 splunkd_build_number=8c86330ac18 status=Up } ].
Any help is greatly appreciated.
Cheers
were you able to fix this issue? if yes, please share solution. Thanks.
You'll want to check the logs on dcaovsg013
because it's returning 500 errors ( actual_response_code=500
) because of reason: non-zero pending job
- there's probably some outstanding issue or load on that machine.