Solved: Why did my cold buckets roll to frozen?

w199284 · ‎04-27-2018

I just made an update to my indexes.conf to account for additional solid state storage on my indexers. as follows.

[volume:cluster1]
path = /appl/splunk/logs1
maxVolumeDataSizeMB = 5976883

with the addition of the extra storage on the volume, I removed the index stanza maxTotalDataSizeMB for [one of] my index. again as follows.

[idxmybigindex]
repFactor=auto
homePath = volume:cluster1/idxmybigindex/db
coldPath = volume:cluster2/idxmybigindex/colddb
thawedPath = /appl/splunk/idxmybigindex/thaweddb
...
maxTotalDataSizeMB = 3400000 <== deleted this parameter
...

When I applied the change across my index cluster, this index had 8 months of cold buckets roll to frozen. I thought I understood from the Managing Indexers and Clusters of Indexers document that maximum index size can be established per volume OR per index (or by global default, which I don't use). Imagine my surprise! I also found a "best practices" page that said this is a preferred method to use. i.e. volume over index specification.

Anyway, I'm doing the forensics now to explain what happened. I'm still under the impression that this is a good configuration. Can anyone help me understand what went south?

xpac · ‎04-29-2018

What happened is that by removing maxTotalDataSizeMB, it went back to the default (which is 500.000 and obviously less than your old 3.400.000).
The maxVolumeDataSizeMB only takes care that ALL indexes in the same volume together don't go over a certain size, so they don't fill your drive to 100%.
You still need to tell all the indexes what size each of them is allowed to grow to before freezing (deleting) data.

What you could do is set maxTotalDataSizeMB to the size of maxVolumeDataSizeMB in [default] so every index is allowed to take up the whole volume, effectively have all indexes grow until they all in sum are as big as defined by the volume.

View solution in original post

xpac · ‎04-29-2018

What happened is that by removing maxTotalDataSizeMB, it went back to the default (which is 500.000 and obviously less than your old 3.400.000).
The maxVolumeDataSizeMB only takes care that ALL indexes in the same volume together don't go over a certain size, so they don't fill your drive to 100%.
You still need to tell all the indexes what size each of them is allowed to grow to before freezing (deleting) data.

What you could do is set maxTotalDataSizeMB to the size of maxVolumeDataSizeMB in [default] so every index is allowed to take up the whole volume, effectively have all indexes grow until they all in sum are as big as defined by the volume.

w199284 · ‎05-01-2018

Thank you again for taking the time to respond. Your suggestion was very close to what I got from support. Here is an excerpt of their findings. So without an index specific size, it seems, the volume size is distributed equally across declared indexes. How, exactly I'm going to manage that I'm not sure.

The reason that you're data is rolling to frozen due to size limitations on the Index is because of the maxVolumeDataSizeMB like we thought.
Your cold index db folders are honoring the 6300000 MB size limit.
The issue is that ALL indexes that are set by this Volume are sharing 5976883 MB on the disk.
It is not that each Index gets 6300000 MB, it's that all of the Indexes that are using that Volume to get settings are sharing 6300000 MB.
Any Index that goes over the amount that Splunk has split equally for them is getting data moved to frozen
...
I will be putting a request in to get the Splunk Docs updated to reflect this because it is not mentioned in the Index Configuration docs at all
...

w199284 · ‎05-03-2018

I just got off the phone with support. You are absolutely correct maxTotalDataSizeMB is the definitive parameter for total index size. If not specified anywhere, you'll get frozen at 500,000Mb. No exceptions apparently. Knowing this in 20/20 hindsight, a reread of the managing indexers and clusters of indexers document still didn't get me to this conclusion. I'm left to believe that the document is incomplete. In any case, I think I have enough information to put in the safety valves to keep something like this from happening again. Thanks once again for your assistance.

xpac · ‎05-03-2018

Put your problem with that document into the feedback form at the bottom. The docs team is really good and will get back to you to get that page properly updated.

gjanders · ‎05-04-2018

splunk btool indexes list --debug

Might have also helped you here, and as per xpac's comments the documentation team welcome feedback. However it is worth noting that some documentation pages (mainly the spec files) come from the developer team and updates can take a long time...

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

xpac · ‎05-01-2018

Thanks for updating this with your response.

So - the question would be - if you just set all your indexes maxTotalDataSizeMB to the value of your volume's maxVolumeDataSizeMB everything should be fine - all indexes could grow to the max volume size, but they can't grow above the max volume size together.
Did you try that?

w199284 · ‎04-29-2018

Thank you for your response. I'm still perplexed by the Managing Indexes and Clusters of Indexes document. It read as though, and maybe it is just me, I could choose a global default OR a volume default OR individual index, size as you suggest OR a combination of these three to achieve a more granular index sizing. Plus, I read a "best practices" document here "www.aditumpartners.com/3-splunk-best-practices-i-learned-the-hard-way" that seemed to concur with my understanding. Maybe I need to post my own "one-best-practice-I-learned-the-hard-way" document!

xpac · ‎04-29-2018

You're welcome. 🙂
If my answer helped, I'd be happy if you'd upvote/accept it.
I just read that document, and while the beginning feels very clear to say what I said, the end is somewhere between confusing and plain wrong. The result of using volumes is that the volume’s “maxVolumeDataSizeMB” setting overrides the indexes “maxTotalDataSizeMB” setting. <- That one is not true, it's simply a second limit, like "Drop stuff if over index limit OR volume limit - so I see where that misunderstanding came from. 😉

elliotproebstel · ‎04-27-2018

It's possible to have both maxTotalDataSizeMB and frozenTimePeriodInSecs settings in place. The first will take precedence, but if you remove it, you would trigger the second. Any chance you have that set for this index?

https://docs.splunk.com/Documentation/Splunk/7.0.3/Indexer/Setaretirementandarchivingpolicy

w199284 · ‎04-27-2018

makes sense. I'm relying on [default] to provide the frozen time interval as follows. Again, this is multiplied across 12 index peers, I think. Operation in a distributed cluster can be confusing to me sometimes.

[default]
maxRunningProcessGroups = 12
maxConcurrentOptimizes = 12
frozenTimePeriodInSecs = 31536000

[volume:cluster1]
path = /appl/splunk/logs1
maxVolumeDataSizeMB = 5976883

[volume:cluster2]
path = /appl/splunk/cold_storage/logs2
maxVolumeDataSizeMB = 6300000

[idxmybigindex]
repFactor=auto
homePath = volume:cluster1/idxmybigindex/db
coldPath = volume:cluster2/idxmybigindex/colddb
thawedPath = /appl/splunk/idxmybigindex/thaweddb
maxHotBuckets = 3
maxDataSize = auto_high_volume
maxWarmDBCount = 275

somesoni2 · ‎04-27-2018

What's the maxVolumeDataSizeMB for volume:cluster2 ?

w199284 · ‎04-27-2018

[volume:cluster2]
path = /appl/splunk/cold_storage/logs2
maxVolumeDataSizeMB = 6300000

These volumes are spread across 12 index peers

Why did my cold buckets roll to frozen?

Detecting Brute Force Account Takeover Fraud with Splunk

Buttercup Games: Further Dashboarding Techniques (Part 9)

Buttercup Games: Further Dashboarding Techniques (Part 8)

Are you a member of the Splunk Community?

Why did my cold buckets roll to frozen?

Detecting Brute Force Account Takeover Fraud with Splunk

Buttercup Games: Further Dashboarding Techniques (Part 9)

Buttercup Games: Further Dashboarding Techniques (Part 8)