Splunk Search

Splunk compression rate for archiving data

lohit
Path Finder

i have to set up a Archiving policy and storage requirements in SPlunk. Estimated logs per day would be 100 GB. So if i go by documentation SPlunk will index 50 GB(with a compression rate of 50%). Then As the data will get old it same move 50 Gb of data from Hot->Warm->Cold. At this point i will setup a archival policy to S3(AWS). I wanted to know whether splunk will archive whole 50GB or 100 Gb data in S3 and What amount of data will be indexed back. Is it going to be 50Gb>

Please help

Tags (1)
0 Karma

lukasz92
Communicator

Has anything changed in this topic?

Are these calculations actual (I mean about 15% for data and about 35% for metadata)?

0 Karma

kristian_kolb
Ultra Champion

Normally, on average, Splunk will compress raw data to about half the size, or thereabouts. So your original 100GB will now be 35GB of index-files and 15GB of compressed data, according to a rough estimate.

When data is frozen - which is what I assume you mean by "archival policy", only the compressed data is saved, and the index-files are deleted. So only about 15% of the original size of the raw data is archived. 15GB

When/if you need to restore archived (frozen) data, you will need to rebuild the index-files before you can search it again. Back to 15+35 GB.

/K

kristian_kolb
Ultra Champion

So the "50%" would be the size of the bucket as a whole, compared to the uncompressed .gz found in its rawdata directory.

This can vary from bucket to bucket, and will depend on the compressability of the log data coming in. Over a diverse set of log sources, the figure "50%" is commonly mentioned as an average compression rate.

0 Karma

kristian_kolb
Ultra Champion

Check /opt/splunk/var/lib/splunk/defaultdb/db/

That's where the 'main' index (defaultdb) is stored. In this folder you will find the hot and warm buckets as subdirs, e.g. db_1234123412_12341234325_33

Inside a bucket there will be some metadata files and .tsidx-files (indexes for searching the raw data). Finally there will be a directory called 'rawdata' that contains the zipped raw data.

0 Karma

lohit
Path Finder

How can i check the compressed data size?

0 Karma
Get Updates on the Splunk Community!

Cisco Catalyst Center Meets Splunk ITSI: From 'Payments Are Down' to Root Cause in ...

The Problem: When Networks and Services Don't Talk Payment systems fail at a retail location. Customers are ...

Print, Leak, Repeat: UEBA Insider Threats You Can't Ignore

Are you ready to uncover the threats hiding in plain sight? Join us for "Print, Leak, Repeat: UEBA Insider ...

New Year, New Changes for Splunk Certifications

As we embrace a new year, we’re making a small but important update to the Splunk Certification ...