We plan on moving to a clustered environment soon, so we are starting to dive into what we need storage wise. Based off Splunk documentation:(http://docs.splunk.com/Documentation/Splunk/6.2.0/Indexer/Systemrequirements) we will need about one terabyte added to accommodate for all our hot buckets. I would like to make sure these numbers are correct.
Examples:
• 3 peer nodes, with replication factor = 3; search factor = 2: This requires a total of 115GB across all peer nodes (averaging 38GB/peer), calculated as follows:
o Total rawdata = ( 15GB * 3
) = 45GB.
o Total index files = ( 35GB * 2
) = 70 GB.
• 5 peer nodes, with replication factor = 5; search factor = 3: This requires a total of 180GB across all peer nodes (averaging 36GB/peer), calculated as follows:
o Total rawdata = ( 15GB * 5
) = 75GB.
o Total index files = ( 35GB * 3
) = 105 GB.
Our planned environment-
Peers: 4
Replication: 4
Search factor: 2
We have about 960gb of hot buckets-
~960GB / 2 = 480GB
144GB Raw Data
336GB Assoc Index FIles
RawData ( 144 * 4
) = 576GB
Index Files ( 336 * 2
) = 672GB
I wanted to make sure this is true and sounds correct that we will need an additional 1248GB per indexer?
http://docs.splunk.com/Documentation/Splunk/6.2.0/Indexer/Systemrequirements
How robust are you individual systems? A replication factor of 4 is awfully high. That is 4 ENTIRE copies of the data spread across each node. I think you may have misinterpretation what replication factor means.
Are you expecting multiple entire individual systems to become unrecoverable? Each replication number means one entire copy of all indexers data spread across the entire platform. This also means that for each GB of original data indexed on a single peer atleast another 500MB will be written on EVERY other node in the system for the same data (assume 2:1 minimum compression). You can potentially run out of available IOPS (for both searching and indexing) depending on your normal ingestion rate.
Normally you have raid which protects individual file systems. But say the entire raid fails. Then you have replicated copies to fix this.
With a replication factor of 4 you are expecting that you will have 3 entire systems with unrecoverable data.
I don't know your situation so it seems unlikely that would be required. You may want to revisit that setting.
Disclaimer: this tool is NOT supported by Splunk. However, it may be useful to you:
thanks for this link, it did help out in our planning.