Deployment Architecture

Estimating size of index

lzhang_soliton
Path Finder

Hi,

I have been looking for how to calculate the range of index file in Splunk. I wrote the math according to the following description. Could anyone check it?

Estimate your storage requirements

The compressed rawdata file is approximately 10% the size of the incoming, pre-indexed raw data.

The associated index files range in size from approximately 10% to 110% of the rawdata file.

Raw data size: 9TB
"rawdata file size": 9TB x 10%
Minimum index size: (9TB x 10%) + ((9TB x 10%) x 10%)
Maximum index size: (9TB x 10%) + ((9TB x 10%) x 110%)

Thank you all in advance.

Tags (3)
0 Karma

ckurtz
Path Finder

For ongoing knowledge of your indexers, I highly recommend Sanford Owing's Fire Brigade Fire Brigade App on Splunkbase. It will give you a ton of information on the size of indexes and individual buckets for existing indexes, which lets you plan for future growth.

(Edited to link to Fire Brigade Version 2 for Splunk 5 & 6)

martin_mueller
SplunkTrust
SplunkTrust

You've got one "x 10%" too many. The index size is based on the raw data, not the compressed raw data.

To get a more accurate reading on your data you could take a 10GB sample and store it in a temporary index, take that size on disk as a baseline.

As a real-life example, JSON data from Twitter is compressible to about 15% and yields indexes about 60% of the raw data size - in total you'd need about 75% of the raw size on disk.

0 Karma

lzhang_soliton
Path Finder

martin_mueller,
Thanks for your post!
To observe a data sample's size is really a good idea, but It's hard to get it right now. So I would like to estimate the maximum size of the index file.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...