Deployment Architecture

This keeps appearing in my search head: "Problem Replicating config (bundle) to search peer x.x.x.x:8089, error while transmitting bundle date."

thomas_forbes
Communicator

I have been receiving the message mentioned above for a few weeks, so I decided to check my splunkd.log file. (2) errors appear over and over:

1)

ERROR DistributedBundleReplicationManager - Unexpected problem while uploading bundle:  Unknown write error.  

2)

ERROR DistributedBundleReplicationManager - Unable to upload bundle to peer named "internalsplunkurl" uri = x.x.x.x:8089.  

This is appearing in all (4) of my peer indexers. (2) are local and (2) are geographically separated. I have added some entries into my distsearch.conf file including - [distributedSearch], [replicationSettings], [replicationWhitelist]. I am not sure what else I can do in order to fix this issue.

Thanks,
Tom Forbes

1 Solution

thomas_forbes
Communicator

After several hours of searching and testing I was able to figure what my issue was.

Several weeks back I was working on a search and in the process of designing this search I executed a pretty general search of my indexed data and exported it to a csv file. My reasoning for this is that I was interested in using the results of the search csv as a lookup table file. The file was not particularly large but it was significant enough to cause issues with replication of the data bundle. I followed (this link: https://answers.splunk.com/answers/302532/large-lookup-caused-the-replication-bundle-to-fail-1.html) and picked up on some verbiage that large csv files could cause issues with bundle replication. I ended up deleting the file in question and magically my search head returned to normal and I was able to query my data as expected.

Please reference comments above for any background information that maybe missing in this answer.

In the future if I plan to execute similar searches that include csv files I plan to blacklist in my distsearch.conf file to avoid this issue.

Thanks for the input everyone.

View solution in original post

thomas_forbes
Communicator

After several hours of searching and testing I was able to figure what my issue was.

Several weeks back I was working on a search and in the process of designing this search I executed a pretty general search of my indexed data and exported it to a csv file. My reasoning for this is that I was interested in using the results of the search csv as a lookup table file. The file was not particularly large but it was significant enough to cause issues with replication of the data bundle. I followed (this link: https://answers.splunk.com/answers/302532/large-lookup-caused-the-replication-bundle-to-fail-1.html) and picked up on some verbiage that large csv files could cause issues with bundle replication. I ended up deleting the file in question and magically my search head returned to normal and I was able to query my data as expected.

Please reference comments above for any background information that maybe missing in this answer.

In the future if I plan to execute similar searches that include csv files I plan to blacklist in my distsearch.conf file to avoid this issue.

Thanks for the input everyone.

View solution in original post

lycollicott
Motivator

Are your search heads at the local site and are they on the same subnet? Every search you run will send a bundle to the indexers and those errors indicate that is failing. Make sure that your search heads can connect to both the local and remote indexers on TCP port 8089.

What kind of network is between the search heads and the remote indexers? If it is a VPN tunnel then the servers need to have their MTU set to 1500 or lower to traverse the internet.

I have had these errors before and it has been either the port 8089 or the MTU.

thomas_forbes
Communicator

Thanks for the input.

Both of my indexers located at my main site (my main site includes (1) search head, (2) indexers, (1) master node, and (1) deployment server) are on the same subnet. My remote indexers are on the same subnet as their search head.

My main site search head has no ability to search any indexed data whatsoever local or remote. My remote search head does have the ability to search data from it's set of local indexers and indexers from my main site.

Also interestingly each indexer has different numbers of indexed events to search from under the search tab. For example at my remote site indexer (1) has access to 19,000,000+ indexed events, indexer (2) has access to 33,000,000+ indexed events. At my main site indexer (1) has access to 3,000,000+ events and indexer (2) has access to 1,600,000,000+ events (this is not a typo either). So the amounts data available for each indexer varies wildly.

I am sure this has to do with large bundle sizes. What I am not sure of is if the actual issue is bandwidth issues concerning my network infrastructure.

0 Karma

morethanyell
Contributor

omg this is it

0 Karma

sjohnson_splunk
Splunk Employee
Splunk Employee

Crank up the logging level to DEBUG for this component: DistributedBundleReplicationManager on one of your indexers. This may give you some additional clues.

Perhaps there is a permissions problem with the directory var/run/dispatch?

.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!