I need someone to translate this from the admin manual
attribute: maxHotBuckets
what it configures: The maximum number of hot buckets.
default: 1, for new, custom indexes. However, if you create a new index, you should set this value to at least 2, to deal with any archival data. The main default index, for example, has this value set to 10.
So, I ask the question, under what circumstances is it better to have more than one hot bucket? How will having more than one bucket help with archival data. Does it, for instance, interact with maxHotSpanSecs, creating additional hot buckets for this old data coming in? If I have data coming in relatively real time, would I actually have more than one hot bucket under normal circumstances?
It is beneficial to have more than one hot bucket:
So, the only time you can be absolutely sure that none of these conditions occur is if you set DATETIME_CONFIG=CURRENT in props.conf for ALL your data sources.
Otherwise, the safety net of having additional hot buckets to deal with outlying events with timestamps that would otherwise affect the performance of searches against the primary buckets is almost always worthwhile.
We learned this from supporting Splunk 2.x and 3.x, which did not have this capability. Some new data with poorly configured timestamp recognition would wreck some otherwise perfectly good buckets.
It is beneficial to have more than one hot bucket:
So, the only time you can be absolutely sure that none of these conditions occur is if you set DATETIME_CONFIG=CURRENT in props.conf for ALL your data sources.
Otherwise, the safety net of having additional hot buckets to deal with outlying events with timestamps that would otherwise affect the performance of searches against the primary buckets is almost always worthwhile.
We learned this from supporting Splunk 2.x and 3.x, which did not have this capability. Some new data with poorly configured timestamp recognition would wreck some otherwise perfectly good buckets.
the way the buckets work in 4.0 and later, is that data that belongs close together and is real time is put in one bucket. If then you add a data input that is old, then this will be put in a different bucket. This is so, in order to not have huge spans for buckets, so that searching is a lot more efficient. Having maxhotbuckets =1, you basically are placing all data, historic, or real time, into one bucket, and hence could cause your splunk instance to waste time in searches.
If you set it to, say, 5 then the real data will be in one bucket, and depending on time of events, historical data will be placed in different buckets.
If data is all real time, and no historical data is coming in, i would assume that you will only see one hot bucket at a time. However, if for example you have an event where the time doesnt get extracted correctly, you might end up with reading a wrong/different time, and as such you could have a second bucket pop up...