Archive

How to estimate the splunk storage size

Explorer

I will be feeding in 10 GB per day to 2 splunk indexers (clustered environment)
Replication Factor = 2
Searchable Factor = 2

How to estimate the storage size for index data on each indexer?

Assuming data retention policy for search will be around for 1 year.

Tags (2)

SplunkTrust
SplunkTrust

We can estimate to an extent, but it will depend on a variety of factors:

  1. How well your raw data compresses
  2. How big the TSIDX files wind up being for your data
  3. Will you have summary indexes, or search accelerations, or accelerated data models (all of which take up extra space)

We'll make a conservative estimate, assuming that after compression and TSIDX creation your data will be 75% of its original size - and we'll also assume for the time being you will not have any summary or acceleration data...

10GB * 365 days * .75 = 2.8T of space before replication. With ideal load balancing across indexers, each should use 1.4T of space before clustering. Your RF=2/SF=2 clustering across two indexers will mean that each indexer will need 2X that storage, so you'll need 2.8T of storage per indexer.

I would include some extra bytes for filesystem overhead, and other things like your _internal indexes and round it up to 3T.

The only assumption here which is really hard to validate is whether or not your data post-indexing will be 75% of the raw size. For typical IT data, this is a pretty conservative estimate and should leave you some wiggle room. But the only way you'll know for sure is to take say a 1GB sample of your logs and see what they wind up needing space-wise once indexed - then you can adjust the 75% up or down as needed.

SplunkTrust
SplunkTrust

GOOD POINT! Each indexer should be getting 5GB/day which is then duplicated 2x to 10GB/day. DERP. I fixed the math. Thanks @martin_mueller!

SplunkTrust
SplunkTrust

While I agree with most of your calculations, I'd have one difference to ponder: If you have a cluster of 2 with SF of 2, each indexer should be storing 100% of the incoming data - not 200%.

As a result, I'd expect 1.4T per indexer before replication (load balancing forwarders) and 2.8T per indexer after replication