Solved: Indexer Cluster, Peer Node Down, Sizing

csiess · ‎03-12-2017

Hi,

I have done a lot of researching about sizing a Splunk Cluster and have a few in deep questions:

Settings: 100 GB a Days Index Tage; 3 Node Cluster; RF/SF 2

Scenario 1: All Nodes Up and Running; Disk Space per Node = 100 GB * CompressionFactor * Retention * RF/SF / 3 --> Nothing fancy.
Scenario 2: One Peere Down; Disk Space Per Node = 100 GB * CompressionFactor * Retention * RF/SF / 2 --> As I can't loose buckets I have to get at least this much Diskpace on one of my indexers.

First tought was that I could limit the size of the Indexes with maxTotalDataSizeMB (which is maxTotalDataSizeMB per Index per Peer Node).

My problem (if I got this right, not testet in DEV Env right now):
If Scenario 1 applies (all 3 nodes up), Size of the Index is limited by maxTotalDataSizeMB; When then one peer Nodes goes down Buckets will start to get replicated to the remaining two nodes causing maxTotalDataSizeMB so hit again (as maxTotalDataSizeMB does not only count primary buckets, but all buckets in the index --> right?) which will result i a loss of data? So with maxTotalDataSizeMB I wan't be able to "reserve" some head room of disk space for this scenario.

Am I right that the only way to garantee that I will not lose any data in a indexer cluster is to use maxWarmDBCount and frozenTimePeriodInSecs together (in order to save some headroom in case a peer node fails)?

Thanks for your help!

jkat54 · ‎03-13-2017

maxDataSize to set the max size of a bucket
maxHotIdleSecs and maxHotSpanSecs to set the time length of the bucket.
maxHotBuckets to set count of hot
maxWarmDBCount to set count of warm

Now you just need to know the retention you desire...

See the table in my answer here https://answers.splunk.com/answers/499760/in-distributed-management-console-dmc-why-is-dataa.html

Say you want 100 days retention on data of at most 1GB/day:
maxDataSize=1024000
maxHotSpanSecs=86401 #not 86400 to avoid ohSnap
maxHotIdleSecs=86401
maxHotBuckets=5
maxWarmDBCount=45
frozenTimePeriodInSecs=8640000
maxTotalDataSize=102400000 + 20%

At any point in time you would have 50 days on hot/warm storage, with 100 days max before frozen kicks in. And if for some reason 20% more buckets were created due to a fix up, you'd have maxTotalDataSize 20% larger than your expected volume of data to compensate.

View solution in original post

jkat54 · ‎03-13-2017

maxDataSize to set the max size of a bucket
maxHotIdleSecs and maxHotSpanSecs to set the time length of the bucket.
maxHotBuckets to set count of hot
maxWarmDBCount to set count of warm

Now you just need to know the retention you desire...

See the table in my answer here https://answers.splunk.com/answers/499760/in-distributed-management-console-dmc-why-is-dataa.html

Say you want 100 days retention on data of at most 1GB/day:
maxDataSize=1024000
maxHotSpanSecs=86401 #not 86400 to avoid ohSnap
maxHotIdleSecs=86401
maxHotBuckets=5
maxWarmDBCount=45
frozenTimePeriodInSecs=8640000
maxTotalDataSize=102400000 + 20%

At any point in time you would have 50 days on hot/warm storage, with 100 days max before frozen kicks in. And if for some reason 20% more buckets were created due to a fix up, you'd have maxTotalDataSize 20% larger than your expected volume of data to compensate.

twinspop · ‎03-13-2017

Interesting that you set maxHotBuckets. Can you give some background? While at .Conf this year, in idle chat at the bar, a Splunk engineer said they recommend leaving it at default in most cases. Just curious I should be looking into this more.

jkat54 · ‎03-13-2017

I dont. Typically I leave it at the default (3). I just put it here in the example so i didnt have to explain the default is 3 and so now i need 47 for warm db count because default warm db count is 300 and i only want 50 buckets on hot/warm storage.... etc. etc.. i was just skipping some details.

Just wanted to show the granularity at which you can control the buckets.

jkat54 · ‎03-13-2017

another point is this is per index... you also have the maxVolumeDataSize which is global across the volume... you might want to add 20% there too...

csiess · ‎03-13-2017

Thanks - this makes a lot of sense.

jkat54 · ‎03-12-2017

Have you looked at coldToFrozenDir?

You could have a the same SAN or NFS mounted on all of the indexers and use it as the coldToFrozenDir as a backup in case these scenarios ever did happen, and then over spec your hot/warm/cold storage by 20-30% to give yourself time to recover from a lost peer scenario.

Also anytime you do loose a peer you should enter maintenance mode asap and dont forget that restarting the cluster master disables maintenance mode.

csiess · ‎03-13-2017

Hi jkat54,

thanks for your response. I do agree that using coldToFrozenDir is a good solution.
Im wondering how one whould achieve over specing the hot/warm/cold storage.

As mentioned above - In a situation where maxTotalDataSizeMB hits the index limit I cannot see a method to configure the cluster to have some additional head room for a cluster peer fail. Of course I agree with you that switching into maintaincen mode is also a good choice.

Indexer Cluster, Peer Node Down, Sizing

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

SignalFlow: What? Why? How?