Splunk Search

Splunk compression rate for archiving data

lohit
Path Finder

i have to set up a Archiving policy and storage requirements in SPlunk. Estimated logs per day would be 100 GB. So if i go by documentation SPlunk will index 50 GB(with a compression rate of 50%). Then As the data will get old it same move 50 Gb of data from Hot->Warm->Cold. At this point i will setup a archival policy to S3(AWS). I wanted to know whether splunk will archive whole 50GB or 100 Gb data in S3 and What amount of data will be indexed back. Is it going to be 50Gb>

Please help

Tags (1)
0 Karma

lukasz92
Communicator

Has anything changed in this topic?

Are these calculations actual (I mean about 15% for data and about 35% for metadata)?

0 Karma

kristian_kolb
Ultra Champion

Normally, on average, Splunk will compress raw data to about half the size, or thereabouts. So your original 100GB will now be 35GB of index-files and 15GB of compressed data, according to a rough estimate.

When data is frozen - which is what I assume you mean by "archival policy", only the compressed data is saved, and the index-files are deleted. So only about 15% of the original size of the raw data is archived. 15GB

When/if you need to restore archived (frozen) data, you will need to rebuild the index-files before you can search it again. Back to 15+35 GB.

/K

kristian_kolb
Ultra Champion

So the "50%" would be the size of the bucket as a whole, compared to the uncompressed .gz found in its rawdata directory.

This can vary from bucket to bucket, and will depend on the compressability of the log data coming in. Over a diverse set of log sources, the figure "50%" is commonly mentioned as an average compression rate.

0 Karma

kristian_kolb
Ultra Champion

Check /opt/splunk/var/lib/splunk/defaultdb/db/

That's where the 'main' index (defaultdb) is stored. In this folder you will find the hot and warm buckets as subdirs, e.g. db_1234123412_12341234325_33

Inside a bucket there will be some metadata files and .tsidx-files (indexes for searching the raw data). Finally there will be a directory called 'rawdata' that contains the zipped raw data.

0 Karma

lohit
Path Finder

How can i check the compressed data size?

0 Karma
Get Updates on the Splunk Community!

New Year. New Skills. New Course Releases from Splunk Education

A new year often inspires reflection—and reinvention. Whether your goals include strengthening your security ...

Splunk and TLS: It doesn't have to be too hard

Overview Creating a TLS cert for Splunk usage is pretty much standard openssl.  To make life better, use an ...

Faster Insights with AI, Streamlined Cloud-Native Operations, and More New Lantern ...

Splunk Lantern is a Splunk customer success center that provides practical guidance from Splunk experts on key ...