Getting Data In

Log retention with archiving in Splunk for a specific index

BRFZ
Communicator

Hello,

I'm looking to set up a log retention policy for a specific index, for example index=test.

Here's what I'd like to configure:

- Total retention time = 24 hours

- First 12 hours in hot+warm, then

- Next 12 hours cold.

- After that, the data should be archived (not deleted).

How exactly should I configure this please? Also does the number of buckets need to be adjusted to support this setup properly on such a short timeframe?

Thanks in advance for your help.

 

Labels (4)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

This question are asked quite often. You find many explanations from community quite easily.

I add here some posts which you should read to understand better the problematic of your needs.

But shortly what those means when we are looking your request.

There are many attributes which you need to use to achieve your target, but I'm quite sure that you cannot use those so that you will get 100% what you are requesting. 

@livehybrid already answer to you one example for starting point. 

The 1st issue is that you cannot force warm -> cold transition by time the only options are amount of buckets and size of homePath also if you are using volumes, then total volume size are used, but usually you have also some other indexes on the same volume.  And those are not depending on time, just # bucket and size of hot+warm bucket.

The 2nd issue is that depending on data volumes and amount of indexers it will be even harder to control the amount of buckets. All these configurations are depending on one indexer. There are no relations to other indexers and indexes what those have. And actually it's not even indexer dependent it's dependent on amount of indexing pipelines . So if you have e.g. 10 indexer all those parameters which @livehybrid present must multiply 10 and if you have e.g. 2 ingesting pipelines per indexer you must multiply previous result by 2. And as normally each indexer/pipeline have 3 open hot bucket you must again multiply previous result by 3 or if you have change that bucket amount then with some other value.

This means that when you are estimating needed amount of warm buckets to achieve that 12h time in hot you must divide your data by (3 * # pipeline * #indexers) to get estimate how many maxWarmDBCount you should use. 

And to get this working correctly this means that your source system events must spread equally on all your indexers to calculate that value correctly. Of course this expecting that your data volume is flat for all time.  If your data volumes follow eg. sin function then it's quite obvious that this cannot work.

One more thing is that if your events are not continuous by time then (e.g time by time there are some old logs or some events in future) those triggers create a new bucket and close old hot even it's not full.

I suppose that above are not all aspects which one must take care of to achive what you are asking.

You could try to achieve your objective, but don't surprise if you cannot get it to work.

r. Ismo

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @BRFZ 

Configure the index in indexes.conf as follows to enforce your requirements:

  • Set frozenTimePeriodInSecs to 86400 (24 hours).
  • Set maxWarmDBCount to a low value and maxHotSpanSecs to 43200 (12 hours) so that buckets roll to warm quickly.
  • Set maxWarmDBCount, maxDataSize, or other thresholds to force buckets to cold after 12 hours.
  • Configure a coldToFrozenDir to archive (not delete) after cold.
 

Try this as an example indexes.conf:

[test]
homePath         = $SPLUNK_DB/test/db
coldPath         = $SPLUNK_DB/test/colddb
thawedPath       = $SPLUNK_DB/test/thaweddb
# set bucket max age to 12h (hot→warm)
maxHotSpanSecs   = 43200

# default size, can reduce for faster bucket rolling #
maxDataSize      = auto    

# keep small number of warm buckets, moves oldest to cold #
maxWarmDBCount   = 1

# total retention 24h 
frozenTimePeriodInSecs = 86400   

# archive to this path, not delete
coldToFrozenDir = /archive/test

With this setup, data will move from hot→warm after 12h (due to maxHotSpanSecs), and oldest warm buckets will be rolled to cold (enforced by low maxWarmDBCount). Data will be kept for 24h in total before being archived.

 

The number of buckets (maxWarmDBCount, etc.) should be kept low to ensure data moves through states quickly for such a short retention. Splunk is optimised for longer retention; very short retention and frequent bucket transitions can increase management overhead, its generally advised to not have small buckets due to this however due to the small retention period you shouldnt end up with too many buckets here?

Other things to remember:
  • If you use coldToFrozenDir, ensure permissions and disk space are sufficient at the archive destination.
  • Test carefully, as low maxWarmDBCount and short maxHotSpanSecs may result in more buckets than usual and performance impacts.
  • If you want to restore archived data, it must be manually thawed.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...