Splunk Search

Problem replicating config (bundle) to search peer

willprince
Engager

I constantly see the below error on my search head. What causes this and how do I go about fixing it. I have removed the IP address and replaced it with x.x.x.x:

Problem replicating config (bundle) to search peer 'x.x.x.x',Reading reply to upload: rv=-2, Receive from=https://x.x.x.x/8089 timed out; exceeded 60sec, as per=distsearch.conf/[replicationSettings]/sendRcvTimeout

Tags (1)

mhouse333
Loves-to-Learn Lots

It is possible that there are other things going on that is causing this error than what is stated above.  Since I identified a unique root cause I wanted to share with all.  The last bullet below was what worked for me but the below bullets represents a summary of recommended steps to get to root cause for this.

  • First verify that the size of the bundle being sent from SH is not greater than the bundle size limit setting on the SH (maxBundleSize distSearch.conf) or the Indexer (max_content_lengh server.conf)
  • Then check for permissions/ownership errors on all the instanced by running “ls -lahR /opt/spunk | grep root”
  • Then run ./splunk btool check
  • Then check the CM bundle details and compare if the latest active bundle in the peers is same as the CM.
  • Then run the top command to see if there are any resources using a significant percentage of CPU utilization over Splunk.  A new application could have been introduced that is preventing writes from taking place over a long period of time due to files being locked by other application.  This can be further verified by:
    • Run the following on each indexer “sudo tcpdump <ipaddressofsourceSH” then attempt to run your search from the SH and see if you see the commands coming over.
    • If fails that there is an application that on in your environment that is preventing Splunk from doing what it need to do and you need to apply for an Splunk exceptions for the recently introduced application.
0 Karma

MuS
Legend

Hi willprince,

Over time your Splunk install will get bigger, more apps added, more knowledge objects added leading to bigger knowledge bundles. The knowledge bundle is the data that the search head replicates and distributes to each search peer to enable its searches.

Have a look at the docs http://docs.splunk.com/Documentation/Splunk/6.3.1/DistSearch/Limittheknowledgebundlesize to learn about more on how to limit the size of the bundles. Also have a look at distsearch.conf http://docs.splunk.com/Documentation/Splunk/6.3.1/admin/Distsearchconf and the option

sendRcvTimeout = <int, in seconds>
* The maximum number of seconds to wait for the sending of a full replication
  to a peer.

to increase the number of seconds to wait for the replication.
Last but not least, check the network connection between your search heads and indexers if the bottleneck is there?

Is this just related to one search peer? If so, try to remove and add it again as search peer...

Hope this helps ...

cheers, MuS

0 Karma

nawazns5038
Builder

I am getting the same error on my search head when I installed MLKT app on the search head . I didn't install the add-on on the indexer cluster . Is it mandatory for the addon to be installed on the indexer cluster.

Problem replicating config (bundle) to search peer '10.x.x.xx.xx:8089', error while transmitting bundle data.

I am getting that error on the search head after I enable the app on the search head without an addon on the indexer cluster. How can this be resolved and why is the bundle being pushed to the indexers while I install in the search head .

Thanks,
N

0 Karma

reed_kelly
Contributor

I wonder when Splunk will provide some real relief for this issue. We have many apps that require large lookups for accelerated data models and other indexer-level search functionality. There are many applications in the search head cluster that is connected to multiple indexer clusters that are developed and maintained by many different groups.

Bundle replication issues are the bane of my existence!!!!

0 Karma

mikehodges01
Explorer

Well I tried changing that setting to 600 seconds and I'm still getting this error.

I guess I'll dive into modifying the bundle to reduce the size. According to the logs it's only 30 MB. Both these servers are on the same subnet.

0 Karma

mikehodges01
Explorer

I'm also seeing "SSL_write failed. Broken pipe" in my splunkd.log on my search head, several seconds before the bundle errors show up. I'm guessing this has something to do with it?

0 Karma

dolivasoh
Contributor

Seeing something similar in one of my deployments. Standalone search head talking to two clustered indexers.\

DistributedPeerManager - Unable to distribute to peer named at uri https://:8089 because replication was unsuccessful. replicationStatus Failed failure info: failed_because_BUNDLE_DATA_TRANSMIT_FAILURE

Same is happening on the second indexer too.

0 Karma

mikehodges01
Explorer

I'm having the same issue. Did anyone figure out what's going on?

0 Karma

stevepraz
Path Finder

Seeing this too. Started after I updated to 6.3. I run a distributed but not clustered environment. I'm running on Windows.

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...