How to estimate the splunk storage size

ajaysamantbms · ‎05-12-2014

I will be feeding in 10 GB per day to 2 splunk indexers (clustered environment)
Replication Factor = 2
Searchable Factor = 2

How to estimate the storage size for index data on each indexer?

Assuming data retention policy for search will be around for 1 year.

dwaddle · ‎05-12-2014

We can estimate to an extent, but it will depend on a variety of factors:

How well your raw data compresses
How big the TSIDX files wind up being for your data
Will you have summary indexes, or search accelerations, or accelerated data models (all of which take up extra space)

We'll make a conservative estimate, assuming that after compression and TSIDX creation your data will be 75% of its original size - and we'll also assume for the time being you will not have any summary or acceleration data...

10GB * 365 days * .75 = 2.8T of space before replication. With ideal load balancing across indexers, each should use 1.4T of space before clustering. Your RF=2/SF=2 clustering across two indexers will mean that each indexer will need 2X that storage, so you'll need 2.8T of storage per indexer.

I would include some extra bytes for filesystem overhead, and other things like your _internal indexes and round it up to 3T.

The only assumption here which is really hard to validate is whether or not your data post-indexing will be 75% of the raw size. For typical IT data, this is a pretty conservative estimate and should leave you some wiggle room. But the only way you'll know for sure is to take say a 1GB sample of your logs and see what they wind up needing space-wise once indexed - then you can adjust the 75% up or down as needed.

dwaddle · ‎05-12-2014

GOOD POINT! Each indexer should be getting 5GB/day which is then duplicated 2x to 10GB/day. DERP. I fixed the math. Thanks @martin_mueller!

martin_mueller · ‎05-12-2014

While I agree with most of your calculations, I'd have one difference to ponder: If you have a cluster of 2 with SF of 2, each indexer should be storing 100% of the incoming data - not 200%.

As a result, I'd expect 1.4T per indexer before replication (load balancing forwarders) and 2.8T per indexer after replication

How to estimate the splunk storage size

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to ...

Join the Conversation

How to estimate the splunk storage size

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to ...