Getting Data In

Why is some data still present in index long after exceeding frozen time period

andrewfoglesong
Explorer

I currently have two indexes, frozenTimePeriodInSecs=432000, and respective frozen directories outside the Splunk directory tree. Main index's maxDataSize=auto-high-volume, "Systems" index's maxDataSize=auto (undefined in stanza, so using global setting).

With this configuration, the cold buckets are unused (as is the intention) so the buckets go from warm to frozen (to the best of my understanding). Data is being successfully frozen and I've thawed it to confirm this. However, emptying out my thawed directories, I still have some data from months ago.
I can see huge breaks where all data was frozen, but other time periods have up to 4,000 counts per day. The only possible reason I can see is that all the "persistent" entries are log files with .gz extensions, but I haven't read about any issues relating to that. Any ideas behind what is causing this?

0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

check the actual timespan and states of all the events in your buckets with the search command
| dbinspect index=myindex

for a bucket to be frozen for time reasons, it needs to :

  • not be hot anymore
  • all events in the bucket have to older than the retention period.

View solution in original post

yannK
Splunk Employee
Splunk Employee

check the actual timespan and states of all the events in your buckets with the search command
| dbinspect index=myindex

for a bucket to be frozen for time reasons, it needs to :

  • not be hot anymore
  • all events in the bucket have to older than the retention period.

View solution in original post

yannK
Splunk Employee
Splunk Employee

Definitely you are indexing historical logs, that are still hot.

If this is a timespan issue, a solution to have then roll faster is to define a shorter time range limit for the events of a hot bucket.

see maxHotSpanSecs
http://docs.splunk.com/Documentation/Splunk/latest/admin/Indexesconf

0 Karma

andrewfoglesong
Explorer

That appears to be it, some of my buckets have earliest times as many months ago but latest times as today. I assumed that upon the introduction of a lump of data (i.e., a new monitor), the buckets would be created with respect to mod-time. So if I understand correctly, once I finalize an archival process of only keeping 90 days of data, I’ll have to wait a full bucket cycle for the changes to fully take place? Would there be any complications if I find an extraneous and noisy log file, point it to the index, and sourcetype it as “unimportant” to expedite the bucket cycle?

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!