Getting Data In

5 year data retention bucket setup

Hiattech
Explorer

We have a requirement to have a 5 year data retention. Unfortunately, we discovered that Splunk has not been configured for this and we've been losing data. We would like to setup buckets for data retention. They would like to be able to search at least 1 year back, perhaps a little more and then archive it for auditing purposes. 

500GB seems to be about 10-11 months of data. I'm a bit new to setting up the buckets so I'm wondering the best way to set this up. Do we roll every month or so to warm buckets and then from there to cold or frozen? What would a configuration like that look like? Our indexes.conf has nothing but the max frozen time but nothing is being frozen yet so that doesn't really help.

Labels (1)
0 Karma
1 Solution

livehybrid
SplunkTrust
SplunkTrust

Hi @Hiattech 

If you want to retain the data as searchable data for 12 months then you will need to set the frozenTimePeriodInSecs to 31536000 (365*24*60*60) for the indexes you need to keep in your indexes.conf - how you deploy this may depend on your setup. There may also be other settings to think about such as maxTotalDataSizeMB which default to 500gb and may be a little small for 12 months data. It also depend on other factors such as if you are using an indexer cluster and the specifics of your Search/Replication factor.

In terms of what happens to the data after 12 months, the data will be "frozen" / archived and this is your opportunity to move it elsewhere for compliance/audit. This is where the coldToFrozenScript or coldToFrozenDir setting comes in handy, this allows you to move the frozen data to a dedicated location.

coldToFrozenScript = <path to script interpreter> <path to script>
* Specifies a script to run when data is to leave the splunk index system.
* Essentially, this implements any archival tasks before the data is
deleted out of its default location.
* Add "$DIR" (including quotes) to this setting on Windows (see below
for details).

coldToFrozenDir = <path to frozen archive>
* An alternative to a 'coldToFrozen' script - this setting lets you
specify a destination path for the frozen archive.
* Splunk software automatically puts frozen buckets in this directory
* For information on how buckets created by different versions are
handled, see "Freezing and Thawing" below.
* If both 'coldToFrozenDir' and 'coldToFrozenScript' are specified,
'coldToFrozenDir' takes precedence
* You must restart splunkd after changing this setting. Reloading the
configuration does not suffice.

For more info see https://help.splunk.com/en/splunk-enterprise/administer/admin-manual/9.4/configuration-file-referenc...

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

View solution in original post

Hiattech
Explorer

While looking into the various settings, I discovered that the default max frozen time is set to 500gb but because we're hitting the 500gb within a year, it's just deleting the data. I thought it would move the data to warm or even cold buckets and then the data would fall off after 5 years. is this not the case? Is it straight up limited to 500gb per index and then just deletes it?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Hiattech ,

as I said, 500 GB is the default max dimension of an index, if you set the maxTotalDataSizeMB  to the dimension you need (based on the Capacity Planning) you can index all the logs you want in a year (obviously based on your license and the capacity of your storage!).

As I said, you have to define the max dimension of your index in 5 years and setup both maxTotalDataSizeMB and frozenTimePeriodInSecs to maintain all your data for 5 years.

Ciao.

Giuseppe

Hiattech
Explorer

Got it. Sorry, I guess that didn't click at first. I'll update those settings for the 2 indexes in question. The other indexes seems to be well within the size/age limit.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Hiattech ,

good for you, see next time!

let us know if we can help you more, or, please, accept one answer for the other people of Community.

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As others already mentioned, the rolling itself hot->warm->cold(->frozen) happens automatically based on configured parameters.

Hot->warm and warm->cold rolling process is mostly happening due to performance reasons. You might want to tune parameters here but they don't affect the overall retention for the index. It's the cold->frozen rolling which is mostly controlled by 

maxTotalDataSizeMB

and

frozenTimePeriodInSecs

Splunks rolls data to frozen (by default it just deletes it if freezing process is not defined) when either the index exceeds the defined size or bucket's latest event is older than the retention period.

There are additional constraints if you have your data organized into volumes. And Smartstore adds to the complexity but I suppose you don't have either.

gcusello
SplunkTrust
SplunkTrust

Hi @Hiattech ,

by default, Splunk indexes have a retention of 6 years and a dimension of 500 GB.

If you need a retention of 5 years, you should do a capacity plan to understand  how many logs you index daily  in each index and then, moltiplying for 365 and for 5, you will have the required dimension of your index.

having this, you should configure for each index two parameters

  • maxTotalDataSizeMB (to setup the max dimension of each index),
  • frozenTimePeriodInSecs (to setup the retention period in seconds)

so, if you have an average of daily ingestion of 100 GB, you want to setup a 10% margin and you want a retention of 5 years, you should setup:

  • maxTotalDataSizeMB = 200000000
  • frozenTimePeriodInSecs =157680000

Then you should define if you need to have 5 years logs searchable or it's possible to have a subset of this time for searchable logs and then save the other logs to manually resume them if required.

In this case, you could setup a shorter period in frozenTimePeriodInSecs and define a script to save buckets for the frozen period.

Ciao.

Giuseppe

 

PrewinThomas
Motivator

@Hiattech 
You don’t manually “roll” buckets each month — bucket movement (hot → warm → cold → frozen) is automatic, based on size, age, and retention settings in indexes.conf.

Sample config you can adapt,

[my_index]
homePath = $SPLUNK_DB/my_index/db
coldPath = /mnt/slowstorage/my_index/colddb
thawedPath = $SPLUNK_DB/my_index/thaweddb

# Keep data searchable for ~1 year
frozenTimePeriodInSecs = 31536000




# Archive frozen buckets instead of deleting
coldToFrozenDir = /mnt/archive/my_index/frozendb
# OR use a script:
# coldToFrozenScript = $SPLUNK_HOME/bin/scripts/archive_to_s3.sh


Regards,
Prewin
If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!

livehybrid
SplunkTrust
SplunkTrust

Hi @Hiattech 

If you want to retain the data as searchable data for 12 months then you will need to set the frozenTimePeriodInSecs to 31536000 (365*24*60*60) for the indexes you need to keep in your indexes.conf - how you deploy this may depend on your setup. There may also be other settings to think about such as maxTotalDataSizeMB which default to 500gb and may be a little small for 12 months data. It also depend on other factors such as if you are using an indexer cluster and the specifics of your Search/Replication factor.

In terms of what happens to the data after 12 months, the data will be "frozen" / archived and this is your opportunity to move it elsewhere for compliance/audit. This is where the coldToFrozenScript or coldToFrozenDir setting comes in handy, this allows you to move the frozen data to a dedicated location.

coldToFrozenScript = <path to script interpreter> <path to script>
* Specifies a script to run when data is to leave the splunk index system.
* Essentially, this implements any archival tasks before the data is
deleted out of its default location.
* Add "$DIR" (including quotes) to this setting on Windows (see below
for details).

coldToFrozenDir = <path to frozen archive>
* An alternative to a 'coldToFrozen' script - this setting lets you
specify a destination path for the frozen archive.
* Splunk software automatically puts frozen buckets in this directory
* For information on how buckets created by different versions are
handled, see "Freezing and Thawing" below.
* If both 'coldToFrozenDir' and 'coldToFrozenScript' are specified,
'coldToFrozenDir' takes precedence
* You must restart splunkd after changing this setting. Reloading the
configuration does not suffice.

For more info see https://help.splunk.com/en/splunk-enterprise/administer/admin-manual/9.4/configuration-file-referenc...

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Get Updates on the Splunk Community!

September Community Champions: A Shoutout to Our Contributors!

As we close the books on another fantastic month, we want to take a moment to celebrate the people who are the ...

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...

What’s New in Splunk Observability – September 2025

What's NewWe are excited to announce the latest enhancements to Splunk Observability, designed to help ITOps ...