Deployment Architecture

Estimating size of index

lzhang_soliton
Path Finder

Hi,

I have been looking for how to calculate the range of index file in Splunk. I wrote the math according to the following description. Could anyone check it?

Estimate your storage requirements

The compressed rawdata file is approximately 10% the size of the incoming, pre-indexed raw data.

The associated index files range in size from approximately 10% to 110% of the rawdata file.

Raw data size: 9TB
"rawdata file size": 9TB x 10%
Minimum index size: (9TB x 10%) + ((9TB x 10%) x 10%)
Maximum index size: (9TB x 10%) + ((9TB x 10%) x 110%)

Thank you all in advance.

Tags (3)
0 Karma

ckurtz
Path Finder

For ongoing knowledge of your indexers, I highly recommend Sanford Owing's Fire Brigade Fire Brigade App on Splunkbase. It will give you a ton of information on the size of indexes and individual buckets for existing indexes, which lets you plan for future growth.

(Edited to link to Fire Brigade Version 2 for Splunk 5 & 6)

martin_mueller
SplunkTrust
SplunkTrust

You've got one "x 10%" too many. The index size is based on the raw data, not the compressed raw data.

To get a more accurate reading on your data you could take a 10GB sample and store it in a temporary index, take that size on disk as a baseline.

As a real-life example, JSON data from Twitter is compressible to about 15% and yields indexes about 60% of the raw data size - in total you'd need about 75% of the raw size on disk.

0 Karma

lzhang_soliton
Path Finder

martin_mueller,
Thanks for your post!
To observe a data sample's size is really a good idea, but It's hard to get it right now. So I would like to estimate the maximum size of the index file.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...