Knowledge Management

Sizing on Smartstore (S3) for local storage

ajiwanand
Path Finder

The smartstore documentation says the following:

"The amount of local storage available on each indexer for cached data must be in proportion to the expected working set. For best results, provision enough local storage to accommodate the equivalent of 30 days' worth of indexed data."

Is this the same as HOT bucket data? or is it ontop of the hot data?

e.g assuming the following factors:
Intake = 100GB/day
Compression ratio = 0.50
Hot Retention = 14 days

Using this formula found in another forum post:
Global Cache sizing = Daily Ingest Rate x Compression Ratio x (RF x Hot Days + (Cached Days - Hot Days))
Cache sizing per indexer = Global Cache sizing / No.of indexers

Cached Days = Splunk recommends 30 days for Splunk Enterprise and 90 days for Enterprise Security
Hot days = Number of days before hot buckets roll over to warm buckets. Ideally this will be between 1 and 7 but configure this based on how hot buckets rolls in your environment.

100 * .50 ( 2 x 14 + (30-14)) = 2200?

Tags (1)
0 Karma
1 Solution

nickhills
Ultra Champion

To calculate the "equivalent" Its not Hot buckets you need to calculate, its Hot+Warm.
There should only be one hot bucket per index - the one that is currently being written to. You can include it (or exclude it - its almost insignificant in this calculation)

Your formulas look more or less sensible though.

You don't mention how many indexers you have, but I assume its 3, with SF/RF 2

Daily Ingest Rate x Compression Ratio - 100 x .5 = 50GB
(RF x WARM Days + (Cached Days - WARM Days)) - I would just call this 30 days and build in some margin. Thus:
RF x Total Days Available in Cache - 2 x 30 = 60
So:
50 x 60 = 3000GB of Storage for cache
Finally:
3TB / Indexers = 1TB free storage for cache/local storage per indexer.

If my comment helps, please give it a thumbs up!

View solution in original post

nickhills
Ultra Champion

To calculate the "equivalent" Its not Hot buckets you need to calculate, its Hot+Warm.
There should only be one hot bucket per index - the one that is currently being written to. You can include it (or exclude it - its almost insignificant in this calculation)

Your formulas look more or less sensible though.

You don't mention how many indexers you have, but I assume its 3, with SF/RF 2

Daily Ingest Rate x Compression Ratio - 100 x .5 = 50GB
(RF x WARM Days + (Cached Days - WARM Days)) - I would just call this 30 days and build in some margin. Thus:
RF x Total Days Available in Cache - 2 x 30 = 60
So:
50 x 60 = 3000GB of Storage for cache
Finally:
3TB / Indexers = 1TB free storage for cache/local storage per indexer.

If my comment helps, please give it a thumbs up!

ajiwanand
Path Finder

Forgive my ignorance here, but does this mean that we do not need to calculate HOT days and that Hot days and Cached days are interchangeable?

Storage calculation would then go for Cached days (based on your formula) and Warm storage on S3?

0 Karma

nickhills
Ultra Champion

I'm approximateing that If you calculate the amount of storage (Hot+Warm) for your target time period (30 days) would consume (on a non SmartStore System) this would roughly match your desired cache Size on a SS enabled instance.

You index 100GB a day, which (regardless of whether its in hot/warm) consumes 50gb of storage.
you want to keep that searchable for 30 days.
your cluster has SF/RF of 2

50 x 30 x 2 = 3Tb
With 3 indexers, allocate each of them 1TB

If my comment helps, please give it a thumbs up!
0 Karma

ajiwanand
Path Finder

Thanks! Dont know why this concept was that hard to grasp 🙂

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...