Getting Data In

Why is some data still present in index long after exceeding frozen time period

andrewfoglesong
Explorer

I currently have two indexes, frozenTimePeriodInSecs=432000, and respective frozen directories outside the Splunk directory tree. Main index's maxDataSize=auto-high-volume, "Systems" index's maxDataSize=auto (undefined in stanza, so using global setting).

With this configuration, the cold buckets are unused (as is the intention) so the buckets go from warm to frozen (to the best of my understanding). Data is being successfully frozen and I've thawed it to confirm this. However, emptying out my thawed directories, I still have some data from months ago.
I can see huge breaks where all data was frozen, but other time periods have up to 4,000 counts per day. The only possible reason I can see is that all the "persistent" entries are log files with .gz extensions, but I haven't read about any issues relating to that. Any ideas behind what is causing this?

0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

check the actual timespan and states of all the events in your buckets with the search command
| dbinspect index=myindex

for a bucket to be frozen for time reasons, it needs to :

  • not be hot anymore
  • all events in the bucket have to older than the retention period.

View solution in original post

yannK
Splunk Employee
Splunk Employee

check the actual timespan and states of all the events in your buckets with the search command
| dbinspect index=myindex

for a bucket to be frozen for time reasons, it needs to :

  • not be hot anymore
  • all events in the bucket have to older than the retention period.

yannK
Splunk Employee
Splunk Employee

Definitely you are indexing historical logs, that are still hot.

If this is a timespan issue, a solution to have then roll faster is to define a shorter time range limit for the events of a hot bucket.

see maxHotSpanSecs
http://docs.splunk.com/Documentation/Splunk/latest/admin/Indexesconf

0 Karma

andrewfoglesong
Explorer

That appears to be it, some of my buckets have earliest times as many months ago but latest times as today. I assumed that upon the introduction of a lump of data (i.e., a new monitor), the buckets would be created with respect to mod-time. So if I understand correctly, once I finalize an archival process of only keeping 90 days of data, I’ll have to wait a full bucket cycle for the changes to fully take place? Would there be any complications if I find an extraneous and noisy log file, point it to the index, and sourcetype it as “unimportant” to expedite the bucket cycle?

0 Karma
Get Updates on the Splunk Community!

Splunk APM & RUM | Upcoming Planned Maintenance

There will be planned maintenance of the streaming infrastructure for Splunk APM and Splunk RUM in the coming ...

Part 2: Diving Deeper With AIOps

Getting the Most Out of Event Correlation and Alert Storm Detection in Splunk IT Service Intelligence   Watch ...

User Groups | Upcoming Events!

If by chance you weren't already aware, the Splunk Community is host to numerous User Groups, organized ...