Getting Data In

Does it make sense to use SmartStore for all data except hot and warm?

danielbb
Motivator

We wonder about using SmartStore. Does it make sense to use it for all data except hot and warm data? Even if we end up with all data in SmartStore, it seems to be a good start to use it first for the older data.

And there is a note at Archive indexed data saying:

-- Although SmartStore indexes do not usually contain cold buckets, you still use the attributes described here (coldToFrozenDir and coldToFrozenScript) to archive SmartStore buckets as they roll directly from warm to frozen. See Configure data retention for SmartStore indexes.

What does it mean?

saravanan90
Contributor

As said, duplicate copies based on SF/RF is applicable only for hot buckets. As soon as it moves to warm bucket, a copy is being sent to S3 buckets & will be evicted from the local cache only if it meets eviction policy criteria.

And when we use smartstore, there is no duplication of warm buckets in local cache, So we got enough storage to hold more data.

With respect to moving data to frozen. Please find below the link.

Cache manager will download the data from S3, upload to frozen directory and removes the buckets in local cache and S3.

https://answers.splunk.com/answers/777620/splunk-smartstore-do-warm-buckets-need-to-roll-to-1.html

0 Karma

woodcock
Esteemed Legend

The bucket life-cycle changes completely with SmartStore. Probably the best topic online anywhere about those nuances is here (read both answers and all of the comments):
https://answers.splunk.com/answers/739051/smartstore-behaviors.html

danielbb
Motivator

So, @davidpaper says -

-- RF/SF only apply to Hot buckets. Once a bucket is rolled, it is uploaded to S3 and any bucket replicates are marked for eviction.

The way I read it, within the SmartStore paradigm, hot buckets are created by Splunk in the conventional way and from that point they go to S3, where they are no warm/cold/frozen boundaries. Do I make sense?

0 Karma

dpaper_splunk
Splunk Employee
Splunk Employee

Yep. I changed the way I think of the bucket life cycle with SmartStore.

Hot is read/write and replicated just like non-SmartStore. Once they roll to read only, they aren't "warm" or "cold" to me anymore, they are just read only as they are copied to the remote object store. Once a bucket exists on the remote object store, we only download bits and pieces, not necessarily the whole bucket, when it is time to search it. Once in the remote object store, there are no warm, cold, or thawed boundaries anymore. Freezing of buckets still exists, which deletes them (by default).

danielbb
Motivator

Pretty mazing @dpaper!

0 Karma

woodcock
Esteemed Legend

That is how I see it: hot buckets do not change at all, warm buckets change to smartstore, and cold is only for very tiny metadata and perhaps for temporary local cache (not sure where that actually lives) but otherwise is a completely dead concept.

athorat
Communicator

@woodcock We recently move to S2 and our initital retention was set to 6 months. A month after the migration we decided to reduced the retention to 3 months but did not see any reduction in storage in s3.

Support found out that versioning was enabled in AWS by the PS engineer during the migration and that caused this issue. Now the new data which is rolled over is deleted but we still have old cruft remaining in s3 which is costing us heavily. Support wants us to delete the data manually by running commands from CLI.

Isnt there a better way in doing this? Does AWS lifecycle rules work only for old data which is still lying there? What are the ways to get rid of this old data apart from removing it manually. 

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...