Solved: Is it possible to replicate the data between the c...

alecshnapir · ‎11-21-2017

We are trying benchmark our NAS storage system as a target for Splunk Enterprise solution. And i'm pretty new to the Splunk world, trying to understand how it all works 🙂

I've created 3 peers cluster (and one master) and used Splunkit to generate the log files on all 3 indexers/peers and to run indexing benchmark test. While setting the replication and search factors to 2 i've expected that there will replication between the peers and overall amount of data that will be written to the storage will be higher than in the case where both replication and search factors are set to 1.

Unfortunately, in both cases the amount of time it took to complete the task was and amount of data eventually written to the storage was exactly the same .... meaning that, probably, the were no replication between the peers during the indexing.

So, the question is it even possible? Meaning, can i make peers to replicate their own internal data during indexing or do i have to use external to peers data sources (forwarders) to make it all work?

lycollicott · ‎11-21-2017

Indexer peers are supposed to replicate their own events independently of any action on your part.

Bucket folders that begin with db_ contain events indexed on that host while folders that begin with rb_ contain events replicated from another indexer. Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data.

If you have both db_ and rb_ folders then replication is working.

View solution in original post

lycollicott · ‎11-21-2017

Indexer peers are supposed to replicate their own events independently of any action on your part.

Bucket folders that begin with db_ contain events indexed on that host while folders that begin with rb_ contain events replicated from another indexer. Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data.

If you have both db_ and rb_ folders then replication is working.

alecshnapir · ‎11-21-2017

Thank you for the prompt response. I will check the db_ and rb_ folders.
However, i have few follow-ups on the points you've mentioned in you answer:

"Indexer peers are supposed to replicate their own events independently of any action on your part": so, does it mean that it doesn't matter what i set for replication and search factors? And, if it's the case, does it mean that i'm getting the same amount of replicas as the amount of the peers in cluster?
"Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data": this is exactly what i saw in my tests, but the numbers that i saw didn't make any sense (at least for me). I have 3 peers in the cluster, every one has 50GB log file, based on Splunk documentation " after it has been compressed and indexed, occupies approximately 50% of its original size". Our storage system compressing it again ~1.6:1, so in theory, putting aside the replication, i'm suppose to see something around 16GB written to the storage. The thing is that i see ~23GB, so is this gap a replicated events from other peers? If yes, isn't suppose to be more? I mean, i'm replicating the events from two additional peers.

Thank you

lycollicott · ‎11-23-2017

There are default values which will be used even if you don't set them yourself.
In a single site cluster the spread of replicated events will vary over time.
Your storage system's compression will not alter the real world size of your files. If you put a 10GB file on your NAS it will still be a 10GB file to your OS regardless of how your NAS operates. Therefore 23GB does fall in line with the Splunk documentation.

In your original question you said that SF and RF are both 1, but that means you're negating all the reasons for having a cluster and hampering its effectiveness greatly. Change them to 2 or 3. (https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Thereplicationfactor)

alecshnapir · ‎11-26-2017

Found the problem. Seems like the between peers replication works once i use the default index and if i'm creating new it doesn't. So, i have to find a way to distribute an info about newly created index with replication ON between all the cluster peers.

lycollicott · ‎11-27-2017

repFactor = auto

Add that line in stanza [default] or to each individual index stanza.

alecshnapir · ‎11-26-2017

Still can't make it work ... the data is not replicated between the peers. Is there anything i have to change in indexes.conf file? I've saw that by default the repFactor is set to 0, and it says that for clustered indexes i have to set it to "auto"

Is it possible to replicate the data between the cluster`s peers while the peers indexing their own data?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

May 2026 Splunk Expert Sessions: Security & Observability

Join the Conversation