Deployment Architecture

Estimating size of index

Path Finder

Hi,

I have been looking for how to calculate the range of index file in Splunk. I wrote the math according to the following description. Could anyone check it?

Estimate your storage requirements

The compressed rawdata file is approximately 10% the size of the incoming, pre-indexed raw data.

The associated index files range in size from approximately 10% to 110% of the rawdata file.

Raw data size: 9TB
"rawdata file size": 9TB x 10%
Minimum index size: (9TB x 10%) + ((9TB x 10%) x 10%)
Maximum index size: (9TB x 10%) + ((9TB x 10%) x 110%)

Thank you all in advance.

Tags (3)
0 Karma

Path Finder

For ongoing knowledge of your indexers, I highly recommend Sanford Owing's Fire Brigade Fire Brigade App on Splunkbase. It will give you a ton of information on the size of indexes and individual buckets for existing indexes, which lets you plan for future growth.

(Edited to link to Fire Brigade Version 2 for Splunk 5 & 6)

0 Karma

SplunkTrust
SplunkTrust

You've got one "x 10%" too many. The index size is based on the raw data, not the compressed raw data.

To get a more accurate reading on your data you could take a 10GB sample and store it in a temporary index, take that size on disk as a baseline.

As a real-life example, JSON data from Twitter is compressible to about 15% and yields indexes about 60% of the raw data size - in total you'd need about 75% of the raw size on disk.

0 Karma

Path Finder

martin_mueller,
Thanks for your post!
To observe a data sample's size is really a good idea, but It's hard to get it right now. So I would like to estimate the maximum size of the index file.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!