Getting Data In

Compression rate for indexes / hot / warm / cold / frozen ?

clyde772
Communicator

I have a few easy question about splunk data compression rate.

  1. What is the typical compression rate for english ASCII based data?

  2. Is the compression rate different from hot / warn / cold / frozen?

  3. Does hot buckets also get compressed?

Easy, huh? Thanks for your answer!

1 Solution

yannK
Splunk Employee
Splunk Employee

easy :

1 - roughly between 1 and infinite minus one.
Seriously, it depends of your data, here is a method is to calculate it.
I usually see about 40%~50% compression.

in this example we look at index=_internal, please replace by your index.


| dbinspect index=_internal
| fields state,id,rawSize,sizeOnDiskMB
| stats sum(rawSize) AS rawTotal, sum(sizeOnDiskMB) AS diskTotalinMB
| eval rawTotalinMB=(rawTotal / 1024 / 1024) | fields - rawTotal
| eval compression=tostring(round(diskTotalinMB / rawTotalinMB * 100, 2)) + "%"
| table rawTotalinMB, diskTotalinMB, compression

2 - the compression rate is identical for hot / warm / cold / frozen
However when a bucket is frozen, some metadata files are removed or compressed (it saves some MB), they can be recreated when thawed.

3 - the hot buckets are been written already compressed.

View solution in original post

edoardo_vicendo
Builder
0 Karma

yannK
Splunk Employee
Splunk Employee

easy :

1 - roughly between 1 and infinite minus one.
Seriously, it depends of your data, here is a method is to calculate it.
I usually see about 40%~50% compression.

in this example we look at index=_internal, please replace by your index.


| dbinspect index=_internal
| fields state,id,rawSize,sizeOnDiskMB
| stats sum(rawSize) AS rawTotal, sum(sizeOnDiskMB) AS diskTotalinMB
| eval rawTotalinMB=(rawTotal / 1024 / 1024) | fields - rawTotal
| eval compression=tostring(round(diskTotalinMB / rawTotalinMB * 100, 2)) + "%"
| table rawTotalinMB, diskTotalinMB, compression

2 - the compression rate is identical for hot / warm / cold / frozen
However when a bucket is frozen, some metadata files are removed or compressed (it saves some MB), they can be recreated when thawed.

3 - the hot buckets are been written already compressed.

splunkreal
Motivator

Hello guys,
it looks like frozen data is around -50% compared to hot/cold, is this correct?
Thanks.

* If this helps, please upvote or accept solution if it solved *
0 Karma

sakthiganesht
New Member

With the above logic for most of the indexers I see 200+% compression an eg rawTotal=42726 and diskTotalinMB = 102921.

As per documentation compression should be around 50% meaning diskTotalinMB should be halfth the rawTotal. But in my case it is more than 2.5 times. Any pointers why it consumes more disk space?

0 Karma

hunderliggur
Path Finder

It all depends on your data. If you are using indexed extractions on json data you will get virtually no reduction is total disk size since the tsidx files will be huge compared to typical syslog data (on which the documentation is built).

0 Karma

lguinn2
Legend

Always a good idea to add new data to a test index and check for

compression
line-breaking
time-stamping

before creating the input in a production environment.

yannK
Splunk Employee
Splunk Employee

No.
But you can run a test by segregating each source/sourcetype to a different index, index a significant sample, then compare with the previous search.

0 Karma

cphair
Builder

@yannk, is there a straightforward way to calculate compression ratios for different sources or sourcetypes within an index?

0 Karma

yannK
Splunk Employee
Splunk Employee

extra question :

  • is the license volume counted on the stored compressed data or on the uncompressed data ?
  • Answer -> on the uncompressed data.

yannK
Splunk Employee
Splunk Employee

Featuring "Sanford" for the search.

0 Karma
Get Updates on the Splunk Community!

Introducing a Smarter Way to Discover Apps on Splunkbase

We’re excited to announce the launch of a foundational enhancement to Splunkbase: App Tiering. Because we’ve ...

How to Send Splunk Observability Alerts to Webex teams in Minutes

As a Developer Evangelist at Splunk, my team and I are constantly tinkering with technology to explore its ...

.conf25 Registration is OPEN!

Ready. Set. Splunk! Your favorite Splunk user event is back and better than ever. Get ready for more technical ...