Deployment Architecture

How to size local drive size for SmartStore cache?

Glasses
Builder

Hi, 

I am looking for advice / documentation that specifically addresses SmartStore and local disk sizing for the searchable s3 cached data.

For instance, how much disk space do you need for long historical searches while maintaining disk space for daily saved searches?  Is there some sort of calculator for this or is it trial-n-error ?

Looking for anyone with experience to share what worked and did not work, when sizing local disks for the SmartStore s3 caching.

Thank you.

Labels (3)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Yes, when a bucket rolls from hot to warm it is written to S3. It is kept in the S2 (SmartStore) cache on the premise that it is likely to be searched soon. Eventually, however, it will be flushed from the cache to make room for other buckets. Sometimes, however, the bloomfilter is retained when a bucket is evicted from cache.
SmartStore is able to download partial buckets from S3, too. See https://docs.splunk.com/Documentation/Splunk/8.0.5/Indexer/SmartStorecachemanager#How_the_cache_mana...

It's a good idea to size your cache to hold at least 30 days of data.  Increase that number if you routinely search data older than 30 days.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Use the Monitoring Console.  Go to Indexing->SmartStore->SmartStore Cache Performance: Instance

---
If this reply helps you, Karma would be appreciated.

Glasses
Builder

Thank you for pointing this out.

However as I am still learning about SmartStore, what do you suggest I watch for in the cache performance dashboard? 

0 Karma

richgalloway
SplunkTrust
SplunkTrust
I'd keep an eye on the bottom two panels: Cache Hits/Misses and Cache Thrash by Index.
---
If this reply helps you, Karma would be appreciated.

richgalloway
SplunkTrust
SplunkTrust
I believe the trial-and-error approach will be necessary.
A reliable calculator would be difficult to produce since the size of the S2 cache depends on the nature of the searches at each customer. Of course, the time window is a major consideration. However, the nature of the searches is a significant factor that is harder to put into a formula. Searches that can be resolved using bloom filters and tsidx files need less space in cache than those which need the raw data.
---
If this reply helps you, Karma would be appreciated.
0 Karma

Glasses
Builder

RE:

 Searches that can be resolved using bloom filters and tsidx files need less space in cache than those which need the raw data.

 

Please correct me if I am misunderstanding, when using smartstore, splunk will write everything to s3 (smartstore) and when you want to search the data, smartstore retrieves the buckets in s3 (using index and time only to find the data in s3) and dumps it on a local drive -for example an EBS attached to an EC2.

So the only time a bloom filter will be read, is when it is on the local drive in a cached dir?

Should we keep a couple weeks on local storage and then write to SmartStore after a set time interval?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Yes, when a bucket rolls from hot to warm it is written to S3. It is kept in the S2 (SmartStore) cache on the premise that it is likely to be searched soon. Eventually, however, it will be flushed from the cache to make room for other buckets. Sometimes, however, the bloomfilter is retained when a bucket is evicted from cache.
SmartStore is able to download partial buckets from S3, too. See https://docs.splunk.com/Documentation/Splunk/8.0.5/Indexer/SmartStorecachemanager#How_the_cache_mana...

It's a good idea to size your cache to hold at least 30 days of data.  Increase that number if you routinely search data older than 30 days.

---
If this reply helps you, Karma would be appreciated.
0 Karma

mufthmu
Path Finder

"It's a good idea to size your cache to hold at least 30 days of data.  Increase that number if you routinely search data older than 30 days."

Hi @richgalloway , what data retention policy responsible for this and how can we configure them so our Cache Manager can hold data until 30 days have passed?

Thanks in advance.

0 Karma

richgalloway
SplunkTrust
SplunkTrust
See https://docs.splunk.com/Documentation/Splunk/8.0.5/Indexer/ConfigureSmartStorecachemanager#Set_cache...
You probably should focus on making the cache large enough to hold 30 days of data rather than tinker with recency.
---
If this reply helps you, Karma would be appreciated.

mufthmu
Path Finder

@richgalloway Hi Rich,

I have mounted a 1000GB EBS volume to my EC2 instance that hosts my standalone Splunk. Our indexing rate is around 10GB/day. However, how do I check the size of my cache manager? My goal is to make it large enough to hold data up to 30 days. Thanks

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...