Hi,
I am looking for advice / documentation that specifically addresses SmartStore and local disk sizing for the searchable s3 cached data.
For instance, how much disk space do you need for long historical searches while maintaining disk space for daily saved searches? Is there some sort of calculator for this or is it trial-n-error ?
Looking for anyone with experience to share what worked and did not work, when sizing local disks for the SmartStore s3 caching.
Thank you.
Yes, when a bucket rolls from hot to warm it is written to S3. It is kept in the S2 (SmartStore) cache on the premise that it is likely to be searched soon. Eventually, however, it will be flushed from the cache to make room for other buckets. Sometimes, however, the bloomfilter is retained when a bucket is evicted from cache.
SmartStore is able to download partial buckets from S3, too. See https://docs.splunk.com/Documentation/Splunk/8.0.5/Indexer/SmartStorecachemanager#How_the_cache_mana...
It's a good idea to size your cache to hold at least 30 days of data. Increase that number if you routinely search data older than 30 days.
Use the Monitoring Console. Go to Indexing->SmartStore->SmartStore Cache Performance: Instance
Thank you for pointing this out.
However as I am still learning about SmartStore, what do you suggest I watch for in the cache performance dashboard?
RE:
Searches that can be resolved using bloom filters and tsidx files need less space in cache than those which need the raw data.
Please correct me if I am misunderstanding, when using smartstore, splunk will write everything to s3 (smartstore) and when you want to search the data, smartstore retrieves the buckets in s3 (using index and time only to find the data in s3) and dumps it on a local drive -for example an EBS attached to an EC2.
So the only time a bloom filter will be read, is when it is on the local drive in a cached dir?
Should we keep a couple weeks on local storage and then write to SmartStore after a set time interval?
Yes, when a bucket rolls from hot to warm it is written to S3. It is kept in the S2 (SmartStore) cache on the premise that it is likely to be searched soon. Eventually, however, it will be flushed from the cache to make room for other buckets. Sometimes, however, the bloomfilter is retained when a bucket is evicted from cache.
SmartStore is able to download partial buckets from S3, too. See https://docs.splunk.com/Documentation/Splunk/8.0.5/Indexer/SmartStorecachemanager#How_the_cache_mana...
It's a good idea to size your cache to hold at least 30 days of data. Increase that number if you routinely search data older than 30 days.
"It's a good idea to size your cache to hold at least 30 days of data. Increase that number if you routinely search data older than 30 days."
Hi @richgalloway , what data retention policy responsible for this and how can we configure them so our Cache Manager can hold data until 30 days have passed?
Thanks in advance.
@richgalloway Hi Rich,
I have mounted a 1000GB EBS volume to my EC2 instance that hosts my standalone Splunk. Our indexing rate is around 10GB/day. However, how do I check the size of my cache manager? My goal is to make it large enough to hold data up to 30 days. Thanks