Getting Data In
Highlighted

Why is log data still present in an Index well after the frozenTimePeriodInSecs retention period?

Communicator

I have an index for which "frozenTimePeriodInSecs = 7776000" (90 days) is set.
Usually Indexes do not have data beyond the retention set.

We have four Indexers in our environment.
Looking at the Index details on each Indexer, the Earliest event is quite different across each Indexer.
For this Index, the Earliest event is 5 January on one Indexer. On another it is Dec 2014, and on the third it is Nov 2014. On the fourth Indexer, the Earliest event for the index is August 2013 - well over 90 days!

Is there anything that can account for this? I discovered this when a user asked me about retention and why there were a lot of results from recent weeks and then sporadic amounts of events from last year.

Highlighted

Re: Why is log data still present in an Index well after the frozenTimePeriodInSecs retention period?

SplunkTrust
SplunkTrust

Run this search over all time on an indexer:

| dbinspect index=foo | table bucketId eventCount startEpoch endEpoch | fieldformat startEpoch = strftime(startEpoch, "%+") | fieldformat endEpoch = strftime(endEpoch, "%+")

I predict you'll see buckets that have startEpoch values way before your retention period... but endEpoch times within the retention period. Splunk will only freeze a bucket after its youngest event has crossed over the icy divide.

Highlighted

Re: Why is log data still present in an Index well after the frozenTimePeriodInSecs retention period?

Communicator

Yes, that looks like that is what has happened... Is there any good explanation for this... why a bucket created a year and a half ago would still be written to? I assumed a bucket was written to until it filled up and then a new bucket was created. I could maybe understand data a couple of weeks/months beyond retention depending on how much data comes in for the index.... but a bucket still being written to 20 months later?!??

startEpoch endEpoch
Thu Aug 8 09:56:40 CDT 2013 Fri Feb 27 10:35:57 CST 2015
Thu Nov 6 17:36:01 CST 2014 Fri Feb 13 00:10:41 CST 2015
Wed Dec 24 09:47:11 CST 2014 Tue Jan 20 15:40:03 CST 2015

0 Karma
Highlighted

Re: Why is log data still present in an Index well after the frozenTimePeriodInSecs retention period?

SplunkTrust
SplunkTrust

Usually this means you're getting out-of-sequence data, sometimes this means your timestamp recognition is failing.

The bucket isn't being written to 20 months later, it's been getting 20-month-old data while it was written to with current data.

0 Karma
Highlighted

Re: Why is log data still present in an Index well after the frozenTimePeriodInSecs retention period?

Path Finder

I've actually found data that isn't even explained by that.

For example, my frozenTimePeriodInSecs is set to 15,778,800 (~6 months). Yet I have buckets whose start and end times BOTH pre-date that. For example:

  • e:\Splunk\prod\colddb\db14133244671336100141_780 (2012-05-02, 2014-10-14)
  • e:\Splunk\prod\db\hotquarv1_2196 (2011-06-26, 2014-12-10)
  • e:\Splunk\prod\colddb\db14141186691336451067_802 (2012-05-08, 2014-10-23)
  • ...

Am I misinterpreting data, or do I really have buckets that are entirely older than 6 months?

And, FWIW, I changed my frozen time over a week ago. I would hope that's ample time for Splunk to freeze what it needs to freeze.

0 Karma
Highlighted

Re: Why is log data still present in an Index well after the frozenTimePeriodInSecs retention period?

Contributor

You might want to look at _indextime for the events contained in those buckets too, that is a very good way to discover out-of-sequence data, though it is rather time consuming.

0 Karma
Highlighted

Re: Why is log data still present in an Index well after the frozenTimePeriodInSecs retention period?

SplunkTrust
SplunkTrust

Bucket 2196 appears to be a hot quarantined bucket - hot buckets can only be frozen after being changed to warm.

For detailed analysis of the other buckets do post a new question, put dbinspect in it, search _internal for those bucket ids to see errors while freezing, and check your indexes config using btool.

0 Karma
Highlighted

Re: Why is log data still present in an Index well after the frozenTimePeriodInSecs retention period?

Path Finder

Indeed, I found a sort of smoking gun for Bucket 2196. The_internal log shows it creating the bucket yesterday around 9:03 am because it got an event supposedly timestamped 1351260209 (Oct 26, 2012). Then a few hours later, it makes the bucket warm, but I have too many warms, so it moves it to cold.

I've tried, but I can't find what event it thinks it found. But I did find other events that are indexed drastically far away from their apparent timestamp (not something I should see in my environment), that could explain this. But even in those events, I don't see anything that should make it think it's timestamp is THAT old!

0 Karma