Hi All,
While testing smart store, I have a couple of questions.
1. What does cache size mean? As I understand, it is the storage size that hot buckets and warm buckets can take. Is it correct?
2. Let's say I set max_cache_size=1TB and there is only one index. When will warm bucket be evicted? Is it evicted at the time when max_cache_size is exceed? otherwise, when buckets get older than hotlist_recency_secs? or both?
3. In case max_cache_size=1TB, Iet's say I do "All time" search, which leads to fetching warm buckets from remote storage. Due to the fetched buckets, max_cache_size will exceed. What happened when the search finish? Is all warm bucket going to be evicted?
I read docs many times and tested by myself, but the bucket behavior is not clear. Would be appreciate if anybody answer for the questions above. Thanks.
1. Thanks for asking about that as it forced me to research it and realize I was incorrect. Hot buckets *are* included in the S2 cache. See https://docs.splunk.com/Documentation/Splunk/8.1.1/Indexer/SmartStorearchitecture#Buckets_and_SmartS...
3. It is expect that the S2 remote store will be far larger than the local cache. When satisfying a search, buckets will be evicted from the cache as often as necessary to make room for more buckets to be downloaded from the remote store.
Cache size is the amount of local (on the indexer) storage used for warm buckets. Hot buckets are not in the cache.
A bucket will be evicted from the cache when storage is needed to download a different bucket from S2.
The max_cache_size will not be exceeded. An "All Time" search will cause many buckets to be evicted from the cache so other buckets can be transferred from SmartStore.
Hi @richgalloway Thank you for your answer.
1. Could you please give me any reference docs about your answers for question #1? As I tested, It seems that the cache size is the amount of local storage for warms and hots buckets. Because I set max_cache_size=5G, then exactly when the summation for warms bucket size and hots bucket size is exceeded 5G, the warm bucket is evicted.
3. I meant the the situation where I set max_cache_size=1TB and there are 1TB data on cache and 2TB data on remote storage. The "all time" search need all data on cache and remote storage as well. In this situation, how does the search execute? Does the search execute on local data, then evict, and fetch from S2?
Please help me understand the behavior of S2. Thanks!
1. Thanks for asking about that as it forced me to research it and realize I was incorrect. Hot buckets *are* included in the S2 cache. See https://docs.splunk.com/Documentation/Splunk/8.1.1/Indexer/SmartStorearchitecture#Buckets_and_SmartS...
3. It is expect that the S2 remote store will be far larger than the local cache. When satisfying a search, buckets will be evicted from the cache as often as necessary to make room for more buckets to be downloaded from the remote store.
Thank you very much! May I ask one more questions? Is it true that the cache manager evicts warm data only when the cache storage is full?
If that's true, it means that the cache has been full since a certain point in time. Then, when a search executes, the cache will continuously evict data from the local storage and fetch warm data from S2. Is my understanding correct? It sounds inefficient..
I'd appreciate if you help me out to understand... Thanks!
Buckets are evicted when room is needed for a bucket to be copied from S2. The cache does not need to be totally full for evictions to happen.
Keep in mind that the cache is designed to hold the most-recently searched data under the premise that the same data will be searched again soon. Ideally, the cache should be sized to hold enough data to satisfy your most common searches so that cache misses don't happen often.
If you have indexes that regularly are searched over long time periods then those indexes may not be good candidates for SmartStore.