We are in the process increase our daily ingest rate to 2TB, and I want to ask the questions about our storage retention policy design. The hot/warm/cold can be searchable from Splunk, what's the ideal retention for cold storage? my contractor design the same period which I am a little confused. thank you
Hot/Warm: 90 days
Cold: 90 days (
Archive: 3 years
We use size constraints on our hot/warm buckets and let the data tell us how long we can keep buckets accessible locally. Once a size constraint is busted, the bucket will transition to cold which is smartstore on S3 for us. That's the only place we apply a time settings for the final transition to frozen. The time setting for transition to frozen (frozenTimePeriodInSecs) is set per index based on availability policy for that data. For our analysts most indexes must be searchable for 90days, some longer. When it transition happens, the S3 bucket is moved to Glacier storage on S3 which has to be thawed (a real PITA process) to make it searchable again. Thawing data starts with finding it for a given index and timeframe, then moving it back to local storage and rebuilding the bucket (at least the metadata).
So what's the reason for cold tier storage if some customers do not use that at all? Because Archive storage will need to be thawed in order to bring to Hot/Warm or cold? I think maybe instead to bring to Hot/Warm, that's the use case for Cold Storage?
Cold buckets are still searchable. The cold phase allows the admins to move data that is less likely to be searched to cheaper (i.e. slower) storage devices. This allow for management of storage cost vs accessibility.
The cold tier is for data that rarely appears in search results. The idea is one can put cold data on slower, cheaper storage to save operating costs.
There is no single ideal retention for cold data. It depends on your requirements and the storage devices available. Typically, cold data is stored on the slowest devices, however, some customers do not use cold at all - data goes from warm directly to frozen.
The more important consideration, IMO, is the overall retention of data no matter where it is stored.