We are currently looking into using the smartstore feature, however, I am having difficulty in finding documentation on how to calculate the amount of storage we would need - both local for the cache and on our S3 solution. The only detail I can find is
The amount of local storage available on each indexer for cached data must be in proportion to the expected working set. For best results, provision enough local storage to accommodate the equivalent of 30 days' worth of indexed data. For example, if the indexer is adding approximately 100GB/day of indexed data, the recommended size reserved for cached data is 3000GB
Using this example does it mean that each of our indexers would need 3TB each local storage for the cache or would it be the same as the traditional storage method where this would be divided by the number of indexers?
Remote Object Store sizing = Daily Ingest Rate x Compression Ratio x Retention period
Compression ratio is generally 50% (15% from the compression of rawdata and 35% from the tsidx metadata files) but this is entirely dependent on the type of data. For higher cardinality data, this percentage can go down resulting in lower compressed data or increase in the storage sizing requirement.
Global Cache sizing = Daily Ingest Rate x Compression Ratio x (RF x Hot Days + (Cached Days - Hot Days))
Cache sizing per indexer = Global Cache sizing / No.of indexers
Cached Days = Splunk recommends 30 days for Splunk Enterprise and 90 days for Enterprise Security
Hot days = Number of days before hot buckets roll over to warm buckets. Ideally this will be between 1 and 7 but configure this based on how hot buckets rolls in your environment.
The Cache sizing should be updated given Splunk 8.0+ will have RF number of copies across the cache until they are evicted. When SmartStore was introduced in 7.2, the behavior was to keep only one copy of the bucket in the cache after the warm bucket is uploaded into S3 but would have had RF-1 number of stubs (just metadata). The behavior has changed since 8.0 (not sure if it was from 7.3+) for performance reasons.
Global Cache sizing = Daily Ingest Rate x Compression Ratio x RF x Cached Days
Take your total daily ingestion rate, divide by the number of indexers, and multiply by 30. That’s what you should have per indexer. This of course assumes your working set is 30 days. If you can quantify the time span your average searches cover, you can adjust accordingly. Most data probably is “stale” after 7 days, though this of course depends on your use case.