I would like to keep "All" data in a single bucket. There is a potential performance impact when Splunk rotates data from "Hot" to "warm" to "cold" with respect to the underlying storage and how it manages its data with its own tiering solution. My 2 possible solutions are:
1) Turn off Splunk rotation so that all data resides in the "hot" bucket. There would be plenty of underlying storage to handle this.
2) Quickly rotate the buckets so that they sitting in "cold". Once in cold, the data would not eventually be deleted in a fairly short period of time thus going from cold to frozen.
Option 1 would be preferred since this is the least amount of data movement. The underlying product already does it's own tiering with hot/warm/cold data and would have a large impact for each bucket move.
A good similar discussion at Bucket rotation and warm, cold...
The recommendation there is to -
-- do not mess with anything other than frozenTimePeriodInSecs
.
I asked a similar question not too long ago, but haven't been able to verify the answer, yet. Perhaps you can. See https://answers.splunk.com/answers/389658/what-will-break-if-i-set-coldpath-to-devnull.html
The host and warm bucket stay on the same path (specified by the homepath attribute in indexes.conf). So for option one, do the following
1) Increase the number of hot buckets by setting maxHotBuckets
in the indexes.conf
2) Increase the number of warm buckets by setting maxWarmDBCount
in the indexes.conf
3) increase the size of bucket by setting maxDataSize
to auto_high_volume OR any high number in the indexes.conf
Not sure If I understood the second option.
I am looking to NOT move data at all. The underlying storage for Splunk has it's own tiering that also moves data based on usage. Based on the large amount of data per day, there is a limited amount that will stay in fast storage and the remainder will move offsite. The concern is that every time Splunk moves data from Hot to Warm to Cold, will trigger an event to pull the data from remote site to local (fast) storage just to move from bucket to bucket. I am looking to not utilize Splunk buckets since this may have an adverse affect on moving physical data on that is being managed by another storage management product.
Option 2 describes getting all the data into the Cold bucket as soon as possible. Once there, the storage management product would control (fast/slow storage) based on read/write activity.
Going from a hot bucket to a warm bucket is just a file rename, not a move so it shouldn't affect performance.