We have a Splunk Ent. 7.0.2 Search head cluster. We are seeing errors like below.
Search peer splunksh-01 has the following message: bundle replication: problem replicating config (bundle) to search peer 'splunkidx-01', HTTP response code 500 (HTTP/1.1 500 read timeout). read timeout (unknown write error).
We are an entirely virtual set up using VMware.
It is possible that there are other things going on that is causing this error than what is stated above. Since I identified a unique root cause I wanted to share with all. The last bullet below was what worked for me but the below bullets represents a summary of recommended steps to get to root cause for this.
@coreyf311 did you find any resolution to this issue? I started seeing same error popping up ON and OFF on my search heads.
Same error and some other timeout errors. running 7.2.3 on VMWare.
How do I prove fault at virtual infrastructure or application...
Hm, this could be either a timeout issue or a bundle replication-related bug (thought about upgrading to a higher 7.0.x version?). You could also try setting a higher connectionTimeout in your distsearch.conf under the replicationSettings stanza and see if this helps in any way.
Edit: You could also take a look into this diagnosis doc.
Skalli
I don't know what the exact issue was except when we migrated that indexer from one ESXi host to another the problem was resolved. The problem was on the ESX host but not sure the exact issue as I am not on the team that supports vmware.