Deployment Architecture

What is the compression ratio of raw data in Splunk?

lal37
Explorer

Hi Team,

Any one please let me know what is the compression ratio of raw data in Splunk?
I heared that its 10:1 ratio that's means 1% of the original raw logs and also 1% index file size.
Please any one explain what is the compression ration in splunk when its storing data on indexer.
Ex:
If am having 100 GB of logs how much indexer space it take?Is it 10GB of indexer space?

Regards,
lal

Tags (3)

edoardo_vicendo
Builder
0 Karma

mhassan
Path Finder

The docs say a 100G incoming is broken to 15% for raw data (journal.gz file) and 35% for meta data (tsidx files). So your 100G will occupy ~50G space. Be aware that is an average. Different ASCII files has various compression ratios (base on repeated patterns).

santiagoaloi
Path Finder

This site will avoid you many headaches:

https://splunk-sizing.appspot.com

0 Karma

gfuente
Motivator

Hello

It´s usually about half of the original size, so for your question 100GB would need about 50gb, from those around 10gb would be the original logs zipped, and 40gb the indexes

Regards

gfuente
Motivator

Yes, thats right.

Those figures are approximated, it depends of the data itself, but as a rule of thumb you can calculate 10% for raw data compressed plus 40% for indexes

Regards

lal37
Explorer

Hi gfuente,
Thanks for the promot response.
According to your response i have one small querie original log file takes only 10gb of 100 GB original size so that means 10:1 compression ratio right??
Also can you please confirm whether indexes take 40 gb of space for 100 gb logs.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...