Getting Data In

Logs in an index getting rolled cold to frozen before size or time limits are reached

Contributor
repFactor = auto
homePath = volume:home/indexname/db
coldPath = volume:SAN/indexname/colddb
thawedPath = $SPLUNK_THAW_VOL/indexname/thaweddb
# the max settings are copied from main's default max settings
maxMemMB = 20
maxConcurrentOptimizes = 6
maxHotIdleSecs = 86400
maxHotBuckets = 10
maxDataSize = auto_high_volume
homePath.maxDataSizeMB = 409600
coldPath.maxDataSizeMB = 1536000
maxTotalDataSizeMB = 1945600
# maxTotalDataSizeMB = ?
# keep logs for 90 days
frozenTimePeriodInSecs = 7776000

The logs seem to be rolling from cold to frozen at around 60 days, all but one or two source types (so when I search back to between 60 and 90 days I only see one or two sourcetypes when there should exist over 20).

The coldpath limit isn't even close to being hit on this index. I implemented this index configuration at the beginning of the year so it should be keeping the data for 90 day periods, yet it's throwing them out before. Are there other areas that can trigger a rolling from cold to frozen? We have plenty of space on the drives as well.

0 Karma

SplunkTrust
SplunkTrust

In regard to your settings:

 homePath = volume:home/indexname/db
 coldPath = volume:SAN/indexname/colddb
 thawedPath = $SPLUNK_THAW_VOL/indexname/thaweddb

From the indexes.conf documentation:

thawedPath must be specified, and cannot use volume: syntax
choose a location convenient for reconstitition from archive goals
For many sites, this may never be used.

From your settings:

 # the max settings are copied from main's default max settings
 maxMemMB = 20

From the indexes.conf documentation this defaults to 5 in the newest version:

IMPORTANT: Calculate this number
carefully. splunkd will crash if you
set this number higher than the
amount of memory available. The
default is recommended for all
environments.

Finally:

 homePath.maxDataSizeMB = 409600
 coldPath.maxDataSizeMB = 1536000
 maxTotalDataSizeMB = 1945600
 frozenTimePeriodInSecs = 7776000

All of these can effect the cold to frozen decision, the once either the homePath size limit is reached or maxWarmDBCount is reached or you reach the hot volume limit (in your example volume:home) you will roll to cold.
From there either 1536000MB can be reached or frozen time period in seconds or cold volume (volume:SAN) in your example, can result buckets rolling to frozen.
Note this all applies per indexer.

solarboyz1 provided some example searches for this. Also personally I wouldn't use maxHotIdleSecs or tweak your maxHotBuckets settings unless you know what you are doing.
Finally, auto_high_volume is designed for higher volume indexes FYI

FYI within the Alerts for Splunk Admins app I have two alerts that relate to this scenario:
IndexerLevel - Buckets are been frozen due to index sizing, effectively:

index=_internal `indexerhosts` sourcetype=splunkd "BucketMover - will attempt to freeze" NOT "because frozenTimePeriodInSecs=" 

And IndexerLevel - Cold data location approaching size limits which I will not paste here.
Application here or github if you just want the searches

0 Karma

Builder

There are multiple factors to why a bucket rolls.
Run the following search:

index=_*  component=BucketMover "will attempt to freeze"

You should see event similar to:

07-24-2014 01:30:51.609 +0200 INFO BucketMover - will attempt to freeze: candidate='/opt/SP/apps/splunk/splunk-6.0.1/var/lib/splunk/rest/db/db_1392823223_1392819715_1' **because frozenTimePeriodInSecs=2419200 exceeds difference between now=1406158251 and latest=1392823223**

These events show the reason the bucket were rolled. That would help pinpoint the root cause.

Contributor

Thank you, this is definitely helpful! Although, one thing that still isn't adding up, is that if I search just for the index in question, the only reason it ever gives is that the frozenTimePeriodInSecs=7776000 is exceeded by the difference between now=number and latest=number

I don't suppose there is any other reason it might be getting evicted?

0 Karma

Builder

The other common reasons I am aware of are when the max index size is hit or max volume size.

0 Karma

Builder

maxTotalDataSizeMB = 1945600

This indicates the total size of the index. If you are only seeing the issue on a few indexes, I would verify these settings.

0 Karma

Contributor

so here is something I just thought about, it seems to be constantly saying it is evicting due to the frozenTimePeriod, yet none of the data seems to ever come close to that period, but I did find a sourcetype every now and then throws in super old data. Could it be that just the one bit of old data causes the entire bucket to get evicted even if the majority of the data in it is not past the frozen time period?

0 Karma

SplunkTrust
SplunkTrust

@briancronrath yes, but only if you rolling based on an index / volume size limit rather than the time based limit (frozenTimePeriodInSecs)

Size-based rolling is oldest bucket first which means the oldest piece of data within a bucket determines when to roll.

frozenTimePeriodInSecs would ensure all data was past the required date (even the newest data in the bucket) before rolling the entire bucket.

0 Karma

Builder

So, in the frozen bucket events:

 07-24-2014 01:30:51.609 +0200 INFO BucketMover - will attempt to freeze: candidate='/opt/SP/apps/splunk/splunk-6.0.1/var/lib/splunk/rest/db/db_1392823223_1392819715_1'because frozenTimePeriodInSecs=2419200 exceeds difference between now=1406158251 and latest=1392823223'

The latest time should be the timestamp of the earliest event in the bucket. Are you receiving freeze events where now - latest < 7776000 ?

0 Karma