Splunk compression rate for archiving data

lohit · ‎08-16-2013

i have to set up a Archiving policy and storage requirements in SPlunk. Estimated logs per day would be 100 GB. So if i go by documentation SPlunk will index 50 GB(with a compression rate of 50%). Then As the data will get old it same move 50 Gb of data from Hot->Warm->Cold. At this point i will setup a archival policy to S3(AWS). I wanted to know whether splunk will archive whole 50GB or 100 Gb data in S3 and What amount of data will be indexed back. Is it going to be 50Gb>

Please help

lukasz92 · ‎06-09-2016

Has anything changed in this topic?

Are these calculations actual (I mean about 15% for data and about 35% for metadata)?

kristian_kolb · ‎08-16-2013

Normally, on average, Splunk will compress raw data to about half the size, or thereabouts. So your original 100GB will now be 35GB of index-files and 15GB of compressed data, according to a rough estimate.

When data is frozen - which is what I assume you mean by "archival policy", only the compressed data is saved, and the index-files are deleted. So only about 15% of the original size of the raw data is archived. 15GB

When/if you need to restore archived (frozen) data, you will need to rebuild the index-files before you can search it again. Back to 15+35 GB.

/K

kristian_kolb · ‎08-16-2013

So the "50%" would be the size of the bucket as a whole, compared to the uncompressed .gz found in its rawdata directory.

This can vary from bucket to bucket, and will depend on the compressability of the log data coming in. Over a diverse set of log sources, the figure "50%" is commonly mentioned as an average compression rate.

kristian_kolb · ‎08-16-2013

Check /opt/splunk/var/lib/splunk/defaultdb/db/

That's where the 'main' index (defaultdb) is stored. In this folder you will find the hot and warm buckets as subdirs, e.g. db_1234123412_12341234325_33

Inside a bucket there will be some metadata files and .tsidx-files (indexes for searching the raw data). Finally there will be a directory called 'rawdata' that contains the zipped raw data.

lohit · ‎08-16-2013

How can i check the compressed data size?

Splunk compression rate for archiving data

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...