We are trying benchmark our NAS storage system as a target for Splunk Enterprise solution. And i'm pretty new to the Splunk world, trying to understand how it all works 🙂
I've created 3 peers cluster (and one master) and used Splunkit to generate the log files on all 3 indexers/peers and to run indexing benchmark test. While setting the replication and search factors to 2 i've expected that there will replication between the peers and overall amount of data that will be written to the storage will be higher than in the case where both replication and search factors are set to 1.
Unfortunately, in both cases the amount of time it took to complete the task was and amount of data eventually written to the storage was exactly the same .... meaning that, probably, the were no replication between the peers during the indexing.
So, the question is it even possible? Meaning, can i make peers to replicate their own internal data during indexing or do i have to use external to peers data sources (forwarders) to make it all work?
Indexer peers are supposed to replicate their own events independently of any action on your part.
Bucket folders that begin with db_
contain events indexed on that host while folders that begin with rb_
contain events replicated from another indexer. Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data.
If you have both db_
and rb_
folders then replication is working.
Indexer peers are supposed to replicate their own events independently of any action on your part.
Bucket folders that begin with db_
contain events indexed on that host while folders that begin with rb_
contain events replicated from another indexer. Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data.
If you have both db_
and rb_
folders then replication is working.
Thank you for the prompt response. I will check the db_ and rb_ folders.
However, i have few follow-ups on the points you've mentioned in you answer:
"Indexer peers are supposed to replicate their own events independently of any action on your part": so, does it mean that it doesn't matter what i set for replication and search factors? And, if it's the case, does it mean that i'm getting the same amount of replicas as the amount of the peers in cluster?
"Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data": this is exactly what i saw in my tests, but the numbers that i saw didn't make any sense (at least for me). I have 3 peers in the cluster, every one has 50GB log file, based on Splunk documentation " after it has been compressed and indexed, occupies approximately 50% of its original size". Our storage system compressing it again ~1.6:1, so in theory, putting aside the replication, i'm suppose to see something around 16GB written to the storage. The thing is that i see ~23GB, so is this gap a replicated events from other peers? If yes, isn't suppose to be more? I mean, i'm replicating the events from two additional peers.
Thank you
In your original question you said that SF and RF are both 1, but that means you're negating all the reasons for having a cluster and hampering its effectiveness greatly. Change them to 2 or 3. (https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Thereplicationfactor)
Found the problem. Seems like the between peers replication works once i use the default index and if i'm creating new it doesn't. So, i have to find a way to distribute an info about newly created index with replication ON between all the cluster peers.
repFactor = auto
Add that line in stanza [default] or to each individual index stanza.
Still can't make it work ... the data is not replicated between the peers. Is there anything i have to change in indexes.conf file? I've saw that by default the repFactor is set to 0, and it says that for clustered indexes i have to set it to "auto"