Splunk Enterprise

What is the best way to estimate frozen storage sizing needs?

vanderaj2
Path Finder

Hello All,

I'm trying to assess some offline storage needs for archiving old Splunk data. I'm planning to adjust my retention policy to 90 days for hot-warm-cold (i.e. "online", searchable data) and then have anything older than 90 days sent to NAS as "frozen", to be stored there for 1 year.

My storage guy is asking how much storage I need on the NAS to cover 1 year of frozen data. My understanding is that compressed, raw events are what would be sent to frozen, if you specify a frozen path or a script.

How does one go about estimating the size of the raw, compressed events?

I have an indexer cluster, comprised of 2 indexers. Should I plan to double whatever the storage estimate is, to account for frozen data coming from 2 indexers?

Thank you in advance!

Tags (1)
0 Karma
1 Solution

s2_splunk
Splunk Employee
Splunk Employee

To get started, plug your numbers in here and it will give you your estimated storage needs based on "normal" compression assumptions (journal.gz = 15% of raw).
Note that if you are in a cluster, every indexer will freeze its own buckets, so you will have RF*raw on your archive volume. You can create a script that identifies replicated bucket archives and deletes all but one copy to minimize your storage need.

View solution in original post

0 Karma

vanderaj2
Path Finder

Thank you both for weighing in! I also have a follow-on question to the Splunk community:

Does anyone know whether during the compression of the raw data, Splunk does any data deduplication to reduce storage overhead? Just curious.....

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

We don't deduplicate anything. The raw data file (journal.gz) is a ZIP file of zipped 128kb data slices.

0 Karma

vanderaj2
Path Finder

Thank you! Appreciate the info on this....

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

To get started, plug your numbers in here and it will give you your estimated storage needs based on "normal" compression assumptions (journal.gz = 15% of raw).
Note that if you are in a cluster, every indexer will freeze its own buckets, so you will have RF*raw on your archive volume. You can create a script that identifies replicated bucket archives and deletes all but one copy to minimize your storage need.

0 Karma

jkat54
SplunkTrust
SplunkTrust

By default Splunk stores the replicated buckets and the searchable copies if coldToFrozenDir is specified. Therefore you can assume the following equation:

 (Daily Ingestion Volume * 0.35 * search factor) + (Daily ingestion Volume * 0.15 * replication factor) = total storage needed

Total storage needed / number of peers = storage per peer.

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

Not quite. Index and metadata files are not frozen, only rawdata (journal.gz) is.
So (Daily ingestion Volume * 0.15 * replication factor) = total storage needed is the best approximation.
This can be reduced to just ingestion*.15 if replicated buckets are deleted after freezing via customer provided script.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Oh my mistake I'm thinking of the rb and db files which are the raw data as you said...

So in a cluster both the replicated copies and the original copies get copied

0 Karma

lfedak_splunk
Splunk Employee
Splunk Employee

Hey @vanderaj2, Here's some documentation on planning your capacity: http://docs.splunk.com/Documentation/Splunk/6.6.3/Capacity/Estimateyourstoragerequirements. It says that "typically, the compressed rawdata file is 10% the size of the incoming, pre-indexed raw data. The associated index files range in size from approximately 10% to 110% of the rawdata file. The number of unique terms in the data affect this value. "

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...