We wonder about using SmartStore. Does it make sense to use it for all data except hot and warm data? Even if we end up with all data in SmartStore, it seems to be a good start to use it first for the older data.
And there is a note at Archive indexed data saying:
-- Although SmartStore indexes do not usually contain cold buckets, you still use the attributes described here (coldToFrozenDir and coldToFrozenScript) to archive SmartStore buckets as they roll directly from warm to frozen. See Configure data retention for SmartStore indexes.
What does it mean?
As said, duplicate copies based on SF/RF is applicable only for hot buckets. As soon as it moves to warm bucket, a copy is being sent to S3 buckets & will be evicted from the local cache only if it meets eviction policy criteria.
And when we use smartstore, there is no duplication of warm buckets in local cache, So we got enough storage to hold more data.
With respect to moving data to frozen. Please find below the link.
Cache manager will download the data from S3, upload to frozen directory and removes the buckets in local cache and S3.
https://answers.splunk.com/answers/777620/splunk-smartstore-do-warm-buckets-need-to-roll-to-1.html
The bucket life-cycle changes completely with SmartStore
. Probably the best topic online anywhere about those nuances is here (read both answers and all of the comments):
https://answers.splunk.com/answers/739051/smartstore-behaviors.html
So, @davidpaper says -
-- RF/SF only apply to Hot buckets. Once a bucket is rolled, it is uploaded to S3 and any bucket replicates are marked for eviction.
The way I read it, within the SmartStore
paradigm, hot buckets are created by Splunk in the conventional way and from that point they go to S3, where they are no warm/cold/frozen boundaries. Do I make sense?
Yep. I changed the way I think of the bucket life cycle with SmartStore.
Hot is read/write and replicated just like non-SmartStore. Once they roll to read only, they aren't "warm" or "cold" to me anymore, they are just read only as they are copied to the remote object store. Once a bucket exists on the remote object store, we only download bits and pieces, not necessarily the whole bucket, when it is time to search it. Once in the remote object store, there are no warm, cold, or thawed boundaries anymore. Freezing of buckets still exists, which deletes them (by default).
Pretty mazing @dpaper!
That is how I see it: hot
buckets do not change at all, warm
buckets change to smartstore
, and cold
is only for very tiny metadata
and perhaps for temporary local cache (not sure where that actually lives) but otherwise is a completely dead concept.
@woodcock We recently move to S2 and our initital retention was set to 6 months. A month after the migration we decided to reduced the retention to 3 months but did not see any reduction in storage in s3.
Support found out that versioning was enabled in AWS by the PS engineer during the migration and that caused this issue. Now the new data which is rolled over is deleted but we still have old cruft remaining in s3 which is costing us heavily. Support wants us to delete the data manually by running commands from CLI.
Isnt there a better way in doing this? Does AWS lifecycle rules work only for old data which is still lying there? What are the ways to get rid of this old data apart from removing it manually.