Solved: Why set maxHotBuckets > 1 in indexes.conf?

jeff · ‎12-28-2010

I need someone to translate this from the admin manual

attribute: maxHotBuckets
what it configures: The maximum number of hot buckets.
default: 1, for new, custom indexes. However, if you create a new index, you should set this value to at least 2, to deal with any archival data. The main default index, for example, has this value set to 10.

So, I ask the question, under what circumstances is it better to have more than one hot bucket? How will having more than one bucket help with archival data. Does it, for instance, interact with maxHotSpanSecs, creating additional hot buckets for this old data coming in? If I have data coming in relatively real time, would I actually have more than one hot bucket under normal circumstances?

araitz · ‎12-28-2010

It is beneficial to have more than one hot bucket:

If your data is not presented to your indexers in a fairly time synchronous order
If you are indexing archived or batched data at the same time you are indexing real time data
If it is possible that your hosts may experience drift or clock skew
If it is possible that Splunk may otherwise receive and interpret (correctly or not) an event with a timestamp at least a few days before or after the other events that are being indexed

So, the only time you can be absolutely sure that none of these conditions occur is if you set DATETIME_CONFIG=CURRENT in props.conf for ALL your data sources.

Otherwise, the safety net of having additional hot buckets to deal with outlying events with timestamps that would otherwise affect the performance of searches against the primary buckets is almost always worthwhile.

We learned this from supporting Splunk 2.x and 3.x, which did not have this capability. Some new data with poorly configured timestamp recognition would wreck some otherwise perfectly good buckets.

View solution in original post

araitz · ‎12-28-2010

It is beneficial to have more than one hot bucket:

If your data is not presented to your indexers in a fairly time synchronous order
If you are indexing archived or batched data at the same time you are indexing real time data
If it is possible that your hosts may experience drift or clock skew
If it is possible that Splunk may otherwise receive and interpret (correctly or not) an event with a timestamp at least a few days before or after the other events that are being indexed

So, the only time you can be absolutely sure that none of these conditions occur is if you set DATETIME_CONFIG=CURRENT in props.conf for ALL your data sources.

Otherwise, the safety net of having additional hot buckets to deal with outlying events with timestamps that would otherwise affect the performance of searches against the primary buckets is almost always worthwhile.

We learned this from supporting Splunk 2.x and 3.x, which did not have this capability. Some new data with poorly configured timestamp recognition would wreck some otherwise perfectly good buckets.

Genti · ‎12-28-2010

the way the buckets work in 4.0 and later, is that data that belongs close together and is real time is put in one bucket. If then you add a data input that is old, then this will be put in a different bucket. This is so, in order to not have huge spans for buckets, so that searching is a lot more efficient. Having maxhotbuckets =1, you basically are placing all data, historic, or real time, into one bucket, and hence could cause your splunk instance to waste time in searches.

If you set it to, say, 5 then the real data will be in one bucket, and depending on time of events, historical data will be placed in different buckets.

If data is all real time, and no historical data is coming in, i would assume that you will only see one hot bucket at a time. However, if for example you have an event where the time doesnt get extracted correctly, you might end up with reading a wrong/different time, and as such you could have a second bucket pop up...

Why set maxHotBuckets > 1 in indexes.conf?

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024