I apologize for what I think is probably going to end up being a pretty dumb question, but I don't have a lot of experience with the internals of Splunk.
I think I understand pretty well why the hot->warm->cold->frozen path for data exists and how it's useful in a traditional SAN storage environment where you might lots of different types of disk and the data can be cleanly moved from one to the other. This is less clear to me in the clustered environment. If you're throwing a lot of indexing servers at the problem, each with their own disk, I don't quite understand the need to have cold buckets and suffer the copy from hot->cold.
As a hypothetical example, if you have 10 servers that have 1000 IOPs each, I don't think I see the benefit to carving out like 3,000 IOPs for cold and 7,000 IOPs for hot when it's all the same "pool" of disk anyway. I'd rather just have all my data spread evenly across the physical disk. Even if I put the hot and cold buckets on the same physical disk, when the buckets roll to cold, I have to pay a big IO penalty when I'd rather be using the IOPs for indexing new data or processing search requests.
So I guess I have three questions:
Is this even a question that makes sense? Do I have a misunderstanding of some fundamental concept?
If it does make sense, is it possible to skip cold buckets entirely and move from hot to warm to frozen/deleted?
If it's not possible, what can be done to reduce the impact of the copy from warm to cold as as much as possible?
... View more