The smartstore documentation says the following:
"The amount of local storage available on each indexer for cached data must be in proportion to the expected working set. For best results, provision enough local storage to accommodate the equivalent of 30 days' worth of indexed data."
Is this the same as HOT bucket data? or is it ontop of the hot data?
e.g assuming the following factors:
Intake = 100GB/day
Compression ratio = 0.50
Hot Retention = 14 days
Using this formula found in another forum post:
Global Cache sizing = Daily Ingest Rate x Compression Ratio x (RF x Hot Days + (Cached Days - Hot Days))
Cache sizing per indexer = Global Cache sizing / No.of indexers
Cached Days = Splunk recommends 30 days for Splunk Enterprise and 90 days for Enterprise Security
Hot days = Number of days before hot buckets roll over to warm buckets. Ideally this will be between 1 and 7 but configure this based on how hot buckets rolls in your environment.
100 * .50 ( 2 x 14 + (30-14)) = 2200?
To calculate the "equivalent" Its not Hot buckets you need to calculate, its Hot+Warm.
There should only be one hot bucket per index - the one that is currently being written to. You can include it (or exclude it - its almost insignificant in this calculation)
Your formulas look more or less sensible though.
You don't mention how many indexers you have, but I assume its 3, with SF/RF 2
Daily Ingest Rate x Compression Ratio - 100 x .5 = 50GB
(RF x WARM Days + (Cached Days - WARM Days)) - I would just call this 30 days and build in some margin. Thus:
RF x Total Days Available in Cache - 2 x 30 = 60
So:
50 x 60 = 3000GB of Storage for cache
Finally:
3TB / Indexers = 1TB free storage for cache/local storage per indexer.
To calculate the "equivalent" Its not Hot buckets you need to calculate, its Hot+Warm.
There should only be one hot bucket per index - the one that is currently being written to. You can include it (or exclude it - its almost insignificant in this calculation)
Your formulas look more or less sensible though.
You don't mention how many indexers you have, but I assume its 3, with SF/RF 2
Daily Ingest Rate x Compression Ratio - 100 x .5 = 50GB
(RF x WARM Days + (Cached Days - WARM Days)) - I would just call this 30 days and build in some margin. Thus:
RF x Total Days Available in Cache - 2 x 30 = 60
So:
50 x 60 = 3000GB of Storage for cache
Finally:
3TB / Indexers = 1TB free storage for cache/local storage per indexer.
Forgive my ignorance here, but does this mean that we do not need to calculate HOT days and that Hot days and Cached days are interchangeable?
Storage calculation would then go for Cached days (based on your formula) and Warm storage on S3?
I'm approximateing that If you calculate the amount of storage (Hot+Warm) for your target time period (30 days) would consume (on a non SmartStore System) this would roughly match your desired cache Size on a SS enabled instance.
You index 100GB a day, which (regardless of whether its in hot/warm) consumes 50gb of storage.
you want to keep that searchable for 30 days.
your cluster has SF/RF of 2
50 x 30 x 2 = 3Tb
With 3 indexers, allocate each of them 1TB
Thanks! Dont know why this concept was that hard to grasp 🙂