Deployment Architecture

Why would an index have several hot buckets open at the same time?

Path Finder

In reviewing the Splunk documentation on how indexers tore indexes I see that is says "An index can have several hot buckets open at a time". If hot buckets are for storing newly created indexed data with a predefined expiration or roll over date why would there ever be more than one hot bucket for a specific index? Even in a large deployment when you get to >200GB, if a single index is being utalized why would we see multiple hot buckets?

1 Solution

Path Finder

Two reasons are that data comes in outside of the quarantinePastSecs and quarantineFutureSecs - these end up in a quarantine bucket, which is a hot bucket. Also if you have multiple ingestion pipelines, they each get their own set of hot buckets.

View solution in original post

Ultra Champion

The admin study guide says that when an inedx will receive events that are not in time-sequence order, then the number of available hot buckets should be higher than the default 3. For high-volume indexes it recommends to have up to 10 hot buckets.

A cheerful discussion about the subject at Can I optimise search by increasing hot buckets?

@sowings says -

-- Hot buckets are not faster, they're merely the ones which are being written to. Increasing the number of them can help search performance, but in a subtle way: see below.

Sometimes, when you're indexing a lot of data from different sources, the subtle time differences between machines means that events arriving at the indexer are slightly offset from one another in time. Splunk likes to keep the timeline relatively smooth within a given bucket, so it might write event #1 to one bucket, but event #2 in another, to align with the time of events already in those buckets.

So now a new event arrives, and it's got a time stamp that belongs in neither bucket #1 nor bucket #2. Splunk creates a new bucket. But if I now have more hot buckets than the maximum allowed, it's time to rotate one to warm. Let's say we selected bucket #2 to go to warm. Now it's closed up, it's files are no longer being written to, and it enters the warm state. But bucket #2 was only 100M when it was rolled. That's pretty small for a bucket, especially when you're indexing 100G / day.

The search performance part of this discussion is here: If you're rolling buckets too fast, and ending up with a lot of small buckets, then search performance will be hampered as to find events, we have to open more and more buckets.

You can see why buckets are being rolled with a search like this one:

index=_internal source=*splunkd.log databasePartitionPolicy moving

You'll get events from Splunk which indicate why the bucket went from hot to warm. If it's for reasons like "exceeded maxHotBuckets", then you might not have enough. The "main" index has defaults set up for indexing a lot of data. It uses ten (10) max hot buckets, and uses the "autohighvolume" parameter for a size limit (10G on 64-bit systems). If you're indexing at a high volume to an index other than main, it might benefit you to mimic some of the config of the main index.

Finally, have a look here about ways to evaluate search performance, and optimize your searches.

0 Karma

SplunkTrust
SplunkTrust

I've supplied the hyperlink for the last line of the post above..."Finally have a look here about ways to evaluate search performance, and optimize your searches."

0 Karma

Path Finder

Thanks this was very helpful. I did not receive an email notification that I response came in hence the delay.

0 Karma

Path Finder

Two reasons are that data comes in outside of the quarantinePastSecs and quarantineFutureSecs - these end up in a quarantine bucket, which is a hot bucket. Also if you have multiple ingestion pipelines, they each get their own set of hot buckets.

View solution in original post

Splunk Employee
Splunk Employee

^^ Yep, and the default setting in indexes.conf is 3 hot buckets.

[splunker@n00bserver bin]$ ./splunk btool indexes list --debug
...
/home/splunker/splunk/etc/system/default/indexes.conf maxHotBuckets = 3
/home/splunker/splunk/etc/system/default/indexes.conf quarantineFutureSecs = 2592000
/home/splunker/splunk/etc/system/default/indexes.conf quarantinePastSecs = 77760000

[splunker@n00bserver bin]$ ./splunk btool server list --debug
...
/home/splunker/splunk/etc/apps/n00blab_all_indexer_base/local/server.conf parallelIngestionPipelines = 4
...

|dbinspect is a great command for taking a look around, or from the cli you can check $SPLUNK_HOME/var/lib/splunk/yourIndex/db. If the reason is quarantine you will be able to tell by the bucket naming convention.

0 Karma