Solved: Why is some data still present in index long after...

andrewfoglesong · ‎10-25-2013

I currently have two indexes, frozenTimePeriodInSecs=432000, and respective frozen directories outside the Splunk directory tree. Main index's maxDataSize=auto-high-volume, "Systems" index's maxDataSize=auto (undefined in stanza, so using global setting).

With this configuration, the cold buckets are unused (as is the intention) so the buckets go from warm to frozen (to the best of my understanding). Data is being successfully frozen and I've thawed it to confirm this. However, emptying out my thawed directories, I still have some data from months ago.
I can see huge breaks where all data was frozen, but other time periods have up to 4,000 counts per day. The only possible reason I can see is that all the "persistent" entries are log files with .gz extensions, but I haven't read about any issues relating to that. Any ideas behind what is causing this?

yannK · ‎10-25-2013

check the actual timespan and states of all the events in your buckets with the search command
| dbinspect index=myindex

for a bucket to be frozen for time reasons, it needs to :

not be hot anymore
all events in the bucket have to older than the retention period.

View solution in original post

yannK · ‎10-25-2013

check the actual timespan and states of all the events in your buckets with the search command
| dbinspect index=myindex

for a bucket to be frozen for time reasons, it needs to :

not be hot anymore
all events in the bucket have to older than the retention period.

yannK · ‎10-25-2013

Definitely you are indexing historical logs, that are still hot.

If this is a timespan issue, a solution to have then roll faster is to define a shorter time range limit for the events of a hot bucket.

see maxHotSpanSecs
http://docs.splunk.com/Documentation/Splunk/latest/admin/Indexesconf

andrewfoglesong · ‎10-25-2013

That appears to be it, some of my buckets have earliest times as many months ago but latest times as today. I assumed that upon the introduction of a lump of data (i.e., a new monitor), the buckets would be created with respect to mod-time. So if I understand correctly, once I finalize an archival process of only keeping 90 days of data, I’ll have to wait a full bucket cycle for the changes to fully take place? Would there be any complications if I find an extraneous and noisy log file, point it to the index, and sourcetype it as “unimportant” to expedite the bucket cycle?

Why is some data still present in index long after exceeding frozen time period

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

Why is some data still present in index long after exceeding frozen time period

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases