Deployment Architecture

Clarification - indexes.conf

vr2312
Contributor

This is my indexes.conf configuration

[volume:hot_warm]
path = /store/hot_warm
maxVolumeDataSizeMB = 1450000

[volume:cold]
path = /store/cold
maxVolumeDataSizeMB = 9400000

[pan]
homePath = volume:hot_warm/pan/db
coldPath = volume:cold/pan/colddb
tstatsHomePath = volume:hot_warm/pan/datamodel_summary
thawedPath = /restore/pan/thaweddb
coldToFrozenDir = /store/cold/archive/pan
maxDataSize = auto
frozenTimePeriodInSecs = 31536000 -> When data is moved from cold to frozen. That is after 1 year.
maxTotalDataSizeMB = 20000000 -> Maximum size of an index. That is 20 TB.
enableTsidxReduction = true -> Reduces size of TSIDX files. Results in reduced bucket size but are slower while searching.
timePeriodInSecBeforeTsidxReduction = 2592000 - > After 30 Days, TSIDX gets enabled.

In this case, will the maxTotalDataSizeMB gets precedence or maxVolumeDataSizeMB gets precendece ?

As per my understanding, maxVolumeDataSizeMB: total size of databases this directory can hold. In other words, it can store hot_warm data till it reaches 1.45 TB and the databases will then roll into cold (which can hold 9.4 TB). Is it right ?

Would the pan index will be stored only for 20TB across all (hot_warm and cold)?
Also, keeping the maxDataSize as auto, I believe there will be 300 hot_warm buckets of 750 MB each in the hot_warm colume. Is it right ?
And if i make it to auto_high_volume, would there be 300 hot_warm buckets of 10GB each ? If that is the case would it be keeping a lot of hot_warm data as compared to how it is being kept now ?

Is the maxWarmDBCount 300 by default ?

0 Karma
1 Solution

woodcock
Esteemed Legend

Let's go one by one:

1: Will the maxTotalDataSizeMB or maxVolumeDataSizeMB gets precedence ?
A: maxTotalDataSizeMB is equal to maxVolumeDataSizeMB; in other words, whichever you hit first will be enforced.

2a: As per my understanding, maxVolumeDataSizeMB is the total size of databases this directory can hold.
A: No, it is the total of EVERYTHING, not just "databases"; if you dump a 20TB tarfile in there, then you have no room for any "database" buckets.
2b: In other words, it can store hot_warm data till it reaches 1.45 TB and the databases will then roll into cold (which can hold 9.4 TB). Is it right ?
A: Yes, assuming nothing else non-Splunk is also using that directory (it should not be).

3: Would the pan index will be stored only for 20TB across all (hot_warm and cold)?
A: Yes, exactly.

4a: Keeping the maxDataSize as auto, I believe there will be 300 hot_warm buckets of 750 MB each in the hot_warm volume. Is it right?
A: The auto setting sets the size to 750MB for bucket size, which in your case means 1450000/750 ~ 1933 buckets (almost all of these will be warm).
4b: And if i make it to auto_high_volume, would there be 300 hot_warm buckets of 10GB each ? If that is the case would it be keeping a lot of hot_warm data as compared to how it is being kept now ?
A: The auto_high_volume sets the size to 10GB on 64-bit, or 1GB on 32-bit systems, which in your case (assuming 64-bit) means 1450000/10240 ~ 141 buckets.

5: Is the maxWarmDBCount 300 by default ?
A: No, the default is 200.

NOTE: You are using MB*1000*1000* for TB which is not correct (it is MB*1024*1024).
Also note, The maximum size of your warm buckets may slightly exceed maxDataSize, due to post-processing and timing issues with the rolling policy.
Also note, some settings may vary from version to version.

View solution in original post

woodcock
Esteemed Legend

Let's go one by one:

1: Will the maxTotalDataSizeMB or maxVolumeDataSizeMB gets precedence ?
A: maxTotalDataSizeMB is equal to maxVolumeDataSizeMB; in other words, whichever you hit first will be enforced.

2a: As per my understanding, maxVolumeDataSizeMB is the total size of databases this directory can hold.
A: No, it is the total of EVERYTHING, not just "databases"; if you dump a 20TB tarfile in there, then you have no room for any "database" buckets.
2b: In other words, it can store hot_warm data till it reaches 1.45 TB and the databases will then roll into cold (which can hold 9.4 TB). Is it right ?
A: Yes, assuming nothing else non-Splunk is also using that directory (it should not be).

3: Would the pan index will be stored only for 20TB across all (hot_warm and cold)?
A: Yes, exactly.

4a: Keeping the maxDataSize as auto, I believe there will be 300 hot_warm buckets of 750 MB each in the hot_warm volume. Is it right?
A: The auto setting sets the size to 750MB for bucket size, which in your case means 1450000/750 ~ 1933 buckets (almost all of these will be warm).
4b: And if i make it to auto_high_volume, would there be 300 hot_warm buckets of 10GB each ? If that is the case would it be keeping a lot of hot_warm data as compared to how it is being kept now ?
A: The auto_high_volume sets the size to 10GB on 64-bit, or 1GB on 32-bit systems, which in your case (assuming 64-bit) means 1450000/10240 ~ 141 buckets.

5: Is the maxWarmDBCount 300 by default ?
A: No, the default is 200.

NOTE: You are using MB*1000*1000* for TB which is not correct (it is MB*1024*1024).
Also note, The maximum size of your warm buckets may slightly exceed maxDataSize, due to post-processing and timing issues with the rolling policy.
Also note, some settings may vary from version to version.

View solution in original post

vr2312
Contributor

@woodcock

You meant to say that maxVolumeDataSizeMB has precedence over (the potentially multiple, added together) maxTotalDataSizeMB ?

0 Karma

woodcock
Esteemed Legend

It is kind of both so I reworded my answer.

Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!