Getting Data In

Automatically remove events older than one year

Builder

I have a requirement to have data older than one year removed from Splunk. By "older than year", I mean the event has to be older than one year, not necessarily when it was indexed.

In my indexes.conf file, I set:

[main]
frozenTimePeriodInSecs = 31536000

31536000 seconds should be one year.

And yet it's showing the earliest events (185,000 of them) as July 18, 2010 (today is August 15, 2011). It was my expectation that the earliest event would be August 15, 2010. Tomorrow's earliest event would be August 16, 2010, etc...

How can I instruct Splunk to automatically purge events older than one year?

Thanks!

Tags (2)

SplunkTrust
SplunkTrust

Splunk removes (freezes) data whole buckets at a time. It can't freeze the bucket until the newest event within the bucket is older than frozenTimePeriodInSecs. You could use the dbinspect search command ( http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Dbinspect ) to examine your buckets and evaluate how large of a time range the bucket covers. That will give you an idea at least of how long past a year you can expect the OLDEST event in the bucket to stick around.

By default, buckets are limited by a time range (maxHotSpanSecs) and a bucket data size
(maxDataSize). If either of these are exceeded, you splunk will roll the bucket from hot to warm.

You could tune the value of maxHotSpanSecs to be the shortest amount of time you might consider doing archiving - say 1 day (86,400 seconds). You still will not get exact archiving - but you minimize how long "archivable" stays around simply because it exists in a bucket that has much newer data in it as well.

If you need more a more precise archiving capability -- say something that makes you able to stand up to lawyer scrutiny -- then I would suggest an enhancement request.

The whole notion of buckets and such is understandably difficult to relate to less technical people. A good analogy for explaining to your nontechnical people would be the paper banker's boxes. Each banker's box has a range of dates written on the box -- and without going through the whole box you can't discard individual documents. So, you have to keep some things in the box a little longer than you might have wanted just because they're in the same box as something a few days newer.

SplunkTrust
SplunkTrust

see update, lemme know if it helps or not

0 Karma

Builder

Got dbinspect to work.... honestly, I'm not quite sure what to do with the information there.
It seems like there has to be an easier way to do this.

0 Karma

SplunkTrust
SplunkTrust

Precise-to-the-minute, no. IF you can plan your bucket boundaries well, then you can get pretty close -- like rounded to the day. For dbinspect, run a search over all time of "| dbinspect index=main"

0 Karma

Builder

So are you saying there's no real way to do it? I was hoping for precise "1 year" cut-off.

I'm playing around with dbinspect like you suggested, but it only outputs "no events found"; not sure what I'm supposed to get out of it.

0 Karma