Solved: Re: Can I optimise search by increasing hot bucket...

BobM · ‎06-04-2013

Three questions in one.

Are hot buckets faster than warm for search.
If so is it because they are in memory or because the file is already open?
Is it a good idea to have 30+ hot buckets to speed up data access?

For background, we are indexing 100GB/day and searches over a few hours seem slow so looking for ways to optimise.

sowings · ‎07-12-2013

Hot buckets are not faster, they're merely the ones which are being written to. Increasing the number of them can help search performance, but in a subtle way: see below.

Sometimes, when you're indexing a lot of data from different sources, the subtle time differences between machines means that events arriving at the indexer are slightly offset from one another in time. Splunk likes to keep the timeline relatively smooth within a given bucket, so it might write event #1 to one bucket, but event #2 in another, to align with the time of events already in those buckets.

So now a new event arrives, and it's got a time stamp that belongs in neither bucket #1 nor bucket #2. Splunk creates a new bucket. But if I now have more hot buckets than the maximum allowed, it's time to rotate one to warm. Let's say we selected bucket #2 to go to warm. Now it's closed up, it's files are no longer being written to, and it enters the warm state. But bucket #2 was only 100M when it was rolled. That's pretty small for a bucket, especially when you're indexing 100G / day.

The search performance part of this discussion is here: If you're rolling buckets too fast, and ending up with a lot of small buckets, then search performance will be hampered as to find events, we have to open more and more buckets.

You can see why buckets are being rolled with a search like this one:

index=_internal source=*splunkd.log databasePartitionPolicy moving

You'll get events from Splunk which indicate why the bucket went from hot to warm. If it's for reasons like "exceeded maxHotBuckets", then you might not have enough. The "main" index has defaults set up for indexing a lot of data. It uses ten (10) max hot buckets, and uses the "auto_high_volume" parameter for a size limit (10G on 64-bit systems). If you're indexing at a high volume to an index other than main, it might benefit you to mimic some of the config of the main index.

Finally, have a look here about ways to evaluate search performance, and optimize your searches.

View solution in original post

sowings · ‎07-12-2013