Solved: How to verify that data is retained for 200 days a...

cwacha · ‎09-19-2012

We have configured our Splunk 4.2 to remove log data as follows:

remove data if it is older than 200 days
remove data if it uses more diskspace than X

The actual disk and diskspace settings are scaled, so that always the time rule should apply. They are used only as a failsafe mechanism.

How can we monitor if data really is deleted because it is old and not because the disk is full? We would like to verify that the diskspace numbers we calculated are correct. Is there a way to see why certain buckets were rolled, so that we can search for "rolled because disk full"?

cwacha · ‎09-20-2012

Hi all,

I found the solution. The events I am looking for are:

09-20-2012 15:38:08.583 +0200 INFO  VolumeManager - The size of volume 'cold' exceeds the limit, will have to acquiesce it (size=2055243225109, max_size=2055208960000, path='/opt/splunk_cold/data/indexer/cold/indexer')
09-20-2012 15:38:08.583 +0200 INFO  VolumeManager - Getting a list of candidate buckets for moving (chilling or freezing)
09-20-2012 15:38:08.704 +0200 INFO  VolumeManager - Will move bucket with latest=1332892827, path='/opt/splunk_cold/data/indexer/cold/indexer/_internaldb/colddb/db_1332892827_1332892827_31'
09-20-2012 15:38:08.705 +0200 INFO  VolumeManager - Bucket moved successfully (current size=2055243221817, max=2055208960000)
09-20-2012 15:38:08.705 +0200 INFO  VolumeManager - Will move bucket with latest=1332894456, path='/opt/splunk_cold/data/indexer/cold/indexer/main/colddb/db_1332894456_1332808057_143'
09-20-2012 15:38:08.705 +0200 INFO  VolumeManager - Bucket moved successfully (current size=2054983523562, max=2055208960000)

I was able to query the relevant data (Will move bucket...) and calculate the actual retention as follows:

index=_internal sourcetype="splunkd" component="VolumeManager" move /opt/splunk_cold | eval eff_retention_days=((_time-latest)/3600/24) | timechart span=1d min(eff_retention_days) as "actual retention (d)"

Thank you all for your help!

View solution in original post

cwacha · ‎09-20-2012

Hi all,

I found the solution. The events I am looking for are:

09-20-2012 15:38:08.583 +0200 INFO  VolumeManager - The size of volume 'cold' exceeds the limit, will have to acquiesce it (size=2055243225109, max_size=2055208960000, path='/opt/splunk_cold/data/indexer/cold/indexer')
09-20-2012 15:38:08.583 +0200 INFO  VolumeManager - Getting a list of candidate buckets for moving (chilling or freezing)
09-20-2012 15:38:08.704 +0200 INFO  VolumeManager - Will move bucket with latest=1332892827, path='/opt/splunk_cold/data/indexer/cold/indexer/_internaldb/colddb/db_1332892827_1332892827_31'
09-20-2012 15:38:08.705 +0200 INFO  VolumeManager - Bucket moved successfully (current size=2055243221817, max=2055208960000)
09-20-2012 15:38:08.705 +0200 INFO  VolumeManager - Will move bucket with latest=1332894456, path='/opt/splunk_cold/data/indexer/cold/indexer/main/colddb/db_1332894456_1332808057_143'
09-20-2012 15:38:08.705 +0200 INFO  VolumeManager - Bucket moved successfully (current size=2054983523562, max=2055208960000)

I was able to query the relevant data (Will move bucket...) and calculate the actual retention as follows:

index=_internal sourcetype="splunkd" component="VolumeManager" move /opt/splunk_cold | eval eff_retention_days=((_time-latest)/3600/24) | timechart span=1d min(eff_retention_days) as "actual retention (d)"

Thank you all for your help!

MuS · ‎09-20-2012

Hallo Clemens 😉

I tested it with some dummy index. First I checked the index=os from the unix app for totalEventCount and configured frozenTimePeriodInSecs:

see http://www.tinyuploads.com/images/R2WExL.png

then I changed frozenTimePeriodInSecs to 60 and restarted splunk. After restart the events were gone:

see http://www.tinyuploads.com/images/Oh0gBw.png

okay, now I had some buckets moved into coldToFrozenDir. I then checked index=_internal source=*splunkd.log BucketMover for those buckets and bingo: the messages you are looking for are the following:

09-20-2012 14:05:46.183 +0200 INFO  BucketMover - will attempt to freeze: /opt/splunkbeta/var/lib/splunk/os/db/db_1348142667_1319115834_28 because frozenTimePeriodInSecs=60 exceeds difference between now=1348142746 and latest=1348142667
09-20-2012 14:05:46.239 +0200 INFO  BucketMover - AsyncFreezer freeze succeeded for /opt/splunkbeta/var/lib/splunk/os/db/db_1348142667_1319115834_28

hope this helps in setting up an alert

en grues nach Züri

MuS

MuS · ‎09-20-2012

upps, true, you were looking for the message if maxDataSize is reached and not if frozenTimePeriodInSecs is reached ... but maybe this will be handy for you as well, someday 🙂

DaveSavage · ‎09-20-2012

Cwacha - you may fing this useful - its something we have been faced with and features in another thread - so credits to rgonzale6 melonman adamw

You need CLI to indexes.conf

maxDataSize = 1024 (bucket size is to be 1GB, or big enough to hold 1 day indexing volume / your expected vols or license limit 😉
maxHotIdleSecs = 86400 (1 day, for hot to warm roll or call the holl-hot-bucket script)
maxWarmDBCount = 30 (30 buckets = 30days, for warm to cold)
frozenTimePeriodInSecs = 7776000 (90 days in sec, cold to frozen)
coldToFrozenDir = /archive/myindex ( after 90 days, index goes here)

emiller42 · ‎09-19-2012

You might want to check the splunkd logs with:

index=_internal sourcetype="splunkd" databasePartitionPolicy "Moving db with id"

This shows when buckets are moved and why.

cwacha · ‎09-20-2012

Thanks but this query does not tell me when buckets get rolled from cold to frozen (which is what I want to see).

Can you tell me what I have to look for?

How to verify that data is retained for 200 days and not removed earlier

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)