i have to set up a Archiving policy and storage requirements in SPlunk. Estimated logs per day would be 100 GB. So if i go by documentation SPlunk will index 50 GB(with a compression rate of 50%). Then As the data will get old it same move 50 Gb of data from Hot->Warm->Cold. At this point i will setup a archival policy to S3(AWS). I wanted to know whether splunk will archive whole 50GB or 100 Gb data in S3 and What amount of data will be indexed back. Is it going to be 50Gb>
Please help
Has anything changed in this topic?
Are these calculations actual (I mean about 15% for data and about 35% for metadata)?
Normally, on average, Splunk will compress raw data to about half the size, or thereabouts. So your original 100GB will now be 35GB of index-files and 15GB of compressed data, according to a rough estimate.
When data is frozen
- which is what I assume you mean by "archival policy", only the compressed data is saved, and the index-files are deleted. So only about 15% of the original size of the raw data is archived. 15GB
When/if you need to restore archived (frozen) data, you will need to rebuild the index-files before you can search it again. Back to 15+35 GB.
/K
So the "50%" would be the size of the bucket as a whole, compared to the uncompressed .gz found in its rawdata directory.
This can vary from bucket to bucket, and will depend on the compressability of the log data coming in. Over a diverse set of log sources, the figure "50%" is commonly mentioned as an average compression rate.
Check /opt/splunk/var/lib/splunk/defaultdb/db/
That's where the 'main' index (defaultdb) is stored. In this folder you will find the hot and warm buckets as subdirs, e.g. db_1234123412_12341234325_33
Inside a bucket there will be some metadata files and .tsidx-files (indexes for searching the raw data). Finally there will be a directory called 'rawdata' that contains the zipped raw data.
How can i check the compressed data size?