Deployment Architecture
Highlighted

Indexer Cluster: Replication of Splunk tsidx files is causing network congestion. How do I control the replication to prevent this?

Engager

Hi,

We have a Splunk indexer cluster with two indexers in each data center. Occasionally, we see a network traffic spike in the backbone of the network due to Splunk replication. Specifically, according to following log, only the .tsidx are being replicated (but not rawdata itself).

06-17-2016 09:25:53.144 INFO  CMSlave - sending search files for bucket bid=main~541~DE42B65D-16D8-4E25-9937-3F45CBCD2376 to guid=FCCBD56B-A38C-4AC8-B1E0-7BEE1291F340
06-17-2016 09:25:53.145 -0500 INFO  BucketReplicator - Created asyncReplication task to replicate files="1466027268-1465947413-3839916974449336698.tsidx 1466027824-1465971628-3842925951557594666.tsidx 1466024974-1465947419-3843695670987834010.tsidx 1466010664-1465963140-3841590830700805022.tsidx 1466027849-1465966921-3851985394289068044.tsidx 1466027462-1465947413-3843855991050422627.tsidx 1466027852-1466018039-3853405925451469935.tsidx 1466027852-1465947414-3853405982510090481.tsidx Hosts.data Sources.data SourceTypes.data rawdata/slicemin.dat rawdata/slicesv2.dat merged_lexicon.lex bloomfilter Strings.data .rawSize" bid=main~541~DE42B65D-16D8-4E25-9937-3F45CBCD2376 to guid=FCCBD56B-A38C-4AC8-B1E0-7BEE1291F340 host=10.111.2.40 s2sport=9887
06-17-2016 09:25:53.145 -0500 INFO  BucketReplicator - event=asyncSendFiles bid=main~541~DE42B65D-16D8-4E25-9937-3F45CBCD2376 jobId=7 files="1466027268-1465947413-3839916974449336698.tsidx 1466027824-1465971628-3842925951557594666.tsidx 1466024974-1465947419-3843695670987834010.tsidx 1466010664-1465963140-3841590830700805022.tsidx 1466027849-1465966921-3851985394289068044.tsidx 1466027462-1465947413-3843855991050422627.tsidx 1466027852-1466018039-3853405925451469935.tsidx 1466027852-1465947414-3853405982510090481.tsidx Hosts.data Sources.data SourceTypes.data rawdata/slicemin.dat rawdata/slicesv2.dat merged_lexicon.lex bloomfilter Strings.data .rawSize"

Since these index files are very big (few Gigs), this replication causes network congestion. I know rawdata are being replicated gradually, but looks like tsidx files are being replicated in one shot which choke the network if the size is big. How can I control this replication so it does not choke the network?

Thanks

Highlighted

Re: Indexer Cluster: Replication of Splunk tsidx files is causing network congestion. How do I control the replication to prevent this?

Builder

Hi,

You could make buckets smallers but is not recommended.

Hope i help you

0 Karma
Highlighted

Re: Indexer Cluster: Replication of Splunk tsidx files is causing network congestion. How do I control the replication to prevent this?

Ultra Champion

Clustering Optimizations in Splunk 6

explains -

-- With this optimization, the search files (TSIDX) are copied from other peers instead of regenerating them from raw data. This will help cluster to meet the replication policies much more quickly.

-- In summary, these two optimizations will greatly help both admins and users. The other good thing is that these are all transparent optimizations and it works out of the box without any tuning. And, that’s awesome!

Apparently the replication was designed to be transparent to the admins for good and bad....

Good discussion about the replication factor -
What are best practices on setting the replication factor for X number of indexers in an indexer clu...

0 Karma
Highlighted

Re: Indexer Cluster: Replication of Splunk tsidx files is causing network congestion. How do I control the replication to prevent this?

Splunk Employee
Splunk Employee

in a good state, there shouldn't be any of these replications occurring:
when a bucket is created, the cluster ensures that there are enough replicated hot buckets to satisfy the replication policies.

the replications you're seeing occur when the cluster is missing a copy (either RF or SF; in this case its SF) of a bucket - its worth investigating what triggers these replications - maybe one of your indexers went offline for a short period of time which triggers a bunch of replications?

0 Karma