Deployment Architecture

Is it possible to replicate the data between the cluster`s peers while the peers indexing their own data?

alecshnapir
New Member

We are trying benchmark our NAS storage system as a target for Splunk Enterprise solution. And i'm pretty new to the Splunk world, trying to understand how it all works 🙂

I've created 3 peers cluster (and one master) and used Splunkit to generate the log files on all 3 indexers/peers and to run indexing benchmark test. While setting the replication and search factors to 2 i've expected that there will replication between the peers and overall amount of data that will be written to the storage will be higher than in the case where both replication and search factors are set to 1.

Unfortunately, in both cases the amount of time it took to complete the task was and amount of data eventually written to the storage was exactly the same .... meaning that, probably, the were no replication between the peers during the indexing.

So, the question is it even possible? Meaning, can i make peers to replicate their own internal data during indexing or do i have to use external to peers data sources (forwarders) to make it all work?

0 Karma
1 Solution

lycollicott
Motivator

Indexer peers are supposed to replicate their own events independently of any action on your part.

Bucket folders that begin with db_ contain events indexed on that host while folders that begin with rb_ contain events replicated from another indexer. Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data.

If you have both db_ and rb_ folders then replication is working.

View solution in original post

0 Karma

lycollicott
Motivator

Indexer peers are supposed to replicate their own events independently of any action on your part.

Bucket folders that begin with db_ contain events indexed on that host while folders that begin with rb_ contain events replicated from another indexer. Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data.

If you have both db_ and rb_ folders then replication is working.

View solution in original post

0 Karma

alecshnapir
New Member

Thank you for the prompt response. I will check the db_ and rb_ folders.
However, i have few follow-ups on the points you've mentioned in you answer:

  • "Indexer peers are supposed to replicate their own events independently of any action on your part": so, does it mean that it doesn't matter what i set for replication and search factors? And, if it's the case, does it mean that i'm getting the same amount of replicas as the amount of the peers in cluster?

  • "Theoretically the bytes written on each indexer could/would/should be nearly identical as the cluster balances how it spreads out the data": this is exactly what i saw in my tests, but the numbers that i saw didn't make any sense (at least for me). I have 3 peers in the cluster, every one has 50GB log file, based on Splunk documentation " after it has been compressed and indexed, occupies approximately 50% of its original size". Our storage system compressing it again ~1.6:1, so in theory, putting aside the replication, i'm suppose to see something around 16GB written to the storage. The thing is that i see ~23GB, so is this gap a replicated events from other peers? If yes, isn't suppose to be more? I mean, i'm replicating the events from two additional peers.

Thank you

0 Karma

lycollicott
Motivator
  1. There are default values which will be used even if you don't set them yourself.
  2. In a single site cluster the spread of replicated events will vary over time.
  3. Your storage system's compression will not alter the real world size of your files. If you put a 10GB file on your NAS it will still be a 10GB file to your OS regardless of how your NAS operates. Therefore 23GB does fall in line with the Splunk documentation.

In your original question you said that SF and RF are both 1, but that means you're negating all the reasons for having a cluster and hampering its effectiveness greatly. Change them to 2 or 3. (https://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Thereplicationfactor)

0 Karma

alecshnapir
New Member

Found the problem. Seems like the between peers replication works once i use the default index and if i'm creating new it doesn't. So, i have to find a way to distribute an info about newly created index with replication ON between all the cluster peers.

0 Karma

lycollicott
Motivator

repFactor = auto

Add that line in stanza [default] or to each individual index stanza.

0 Karma

alecshnapir
New Member

Still can't make it work ... the data is not replicated between the peers. Is there anything i have to change in indexes.conf file? I've saw that by default the repFactor is set to 0, and it says that for clustered indexes i have to set it to "auto"

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.