Solved: Clarification - indexes.conf

vr2312 · ‎04-30-2017

This is my indexes.conf configuration

[volume:hot_warm]
path = /store/hot_warm
maxVolumeDataSizeMB = 1450000

[volume:cold]
path = /store/cold
maxVolumeDataSizeMB = 9400000

[pan]
homePath = volume:hot_warm/pan/db
coldPath = volume:cold/pan/colddb
tstatsHomePath = volume:hot_warm/pan/datamodel_summary
thawedPath = /restore/pan/thaweddb
coldToFrozenDir = /store/cold/archive/pan
maxDataSize = auto
frozenTimePeriodInSecs = 31536000 -> When data is moved from cold to frozen. That is after 1 year.
maxTotalDataSizeMB = 20000000 -> Maximum size of an index. That is 20 TB.
enableTsidxReduction = true -> Reduces size of TSIDX files. Results in reduced bucket size but are slower while searching.
timePeriodInSecBeforeTsidxReduction = 2592000 - > After 30 Days, TSIDX gets enabled.

In this case, will the maxTotalDataSizeMB gets precedence or maxVolumeDataSizeMB gets precendece ?

As per my understanding, maxVolumeDataSizeMB: total size of databases this directory can hold. In other words, it can store hot_warm data till it reaches 1.45 TB and the databases will then roll into cold (which can hold 9.4 TB). Is it right ?

Would the pan index will be stored only for 20TB across all (hot_warm and cold)?
Also, keeping the maxDataSize as auto, I believe there will be 300 hot_warm buckets of 750 MB each in the hot_warm colume. Is it right ?
And if i make it to auto_high_volume, would there be 300 hot_warm buckets of 10GB each ? If that is the case would it be keeping a lot of hot_warm data as compared to how it is being kept now ?

Is the maxWarmDBCount 300 by default ?

woodcock · ‎04-30-2017

Let's go one by one:

1: Will the maxTotalDataSizeMB or maxVolumeDataSizeMB gets precedence ?
A: maxTotalDataSizeMB is equal to maxVolumeDataSizeMB; in other words, whichever you hit first will be enforced.

2a: As per my understanding, maxVolumeDataSizeMB is the total size of databases this directory can hold.
A: No, it is the total of EVERYTHING, not just "databases"; if you dump a 20TB tarfile in there, then you have no room for any "database" buckets.
2b: In other words, it can store hot_warm data till it reaches 1.45 TB and the databases will then roll into cold (which can hold 9.4 TB). Is it right ?
A: Yes, assuming nothing else non-Splunk is also using that directory (it should not be).

3: Would the pan index will be stored only for 20TB across all (hot_warm and cold)?
A: Yes, exactly.

4a: Keeping the maxDataSize as auto, I believe there will be 300 hot_warm buckets of 750 MB each in the hot_warm volume. Is it right?
A: The auto setting sets the size to 750MB for bucket size, which in your case means 1450000/750 ~ 1933 buckets (almost all of these will be warm).
4b: And if i make it to auto_high_volume, would there be 300 hot_warm buckets of 10GB each ? If that is the case would it be keeping a lot of hot_warm data as compared to how it is being kept now ?
A: The auto_high_volume sets the size to 10GB on 64-bit, or 1GB on 32-bit systems, which in your case (assuming 64-bit) means 1450000/10240 ~ 141 buckets.

5: Is the maxWarmDBCount 300 by default ?
A: No, the default is 200.

NOTE: You are using MB*1000*1000* for TB which is not correct (it is MB*1024*1024).
Also note, The maximum size of your warm buckets may slightly exceed maxDataSize, due to post-processing and timing issues with the rolling policy.
Also note, some settings may vary from version to version.

View solution in original post

woodcock · ‎04-30-2017

Let's go one by one:

1: Will the maxTotalDataSizeMB or maxVolumeDataSizeMB gets precedence ?
A: maxTotalDataSizeMB is equal to maxVolumeDataSizeMB; in other words, whichever you hit first will be enforced.

2a: As per my understanding, maxVolumeDataSizeMB is the total size of databases this directory can hold.
A: No, it is the total of EVERYTHING, not just "databases"; if you dump a 20TB tarfile in there, then you have no room for any "database" buckets.
2b: In other words, it can store hot_warm data till it reaches 1.45 TB and the databases will then roll into cold (which can hold 9.4 TB). Is it right ?
A: Yes, assuming nothing else non-Splunk is also using that directory (it should not be).

3: Would the pan index will be stored only for 20TB across all (hot_warm and cold)?
A: Yes, exactly.

4a: Keeping the maxDataSize as auto, I believe there will be 300 hot_warm buckets of 750 MB each in the hot_warm volume. Is it right?
A: The auto setting sets the size to 750MB for bucket size, which in your case means 1450000/750 ~ 1933 buckets (almost all of these will be warm).
4b: And if i make it to auto_high_volume, would there be 300 hot_warm buckets of 10GB each ? If that is the case would it be keeping a lot of hot_warm data as compared to how it is being kept now ?
A: The auto_high_volume sets the size to 10GB on 64-bit, or 1GB on 32-bit systems, which in your case (assuming 64-bit) means 1450000/10240 ~ 141 buckets.

5: Is the maxWarmDBCount 300 by default ?
A: No, the default is 200.

NOTE: You are using MB*1000*1000* for TB which is not correct (it is MB*1024*1024).
Also note, The maximum size of your warm buckets may slightly exceed maxDataSize, due to post-processing and timing issues with the rolling policy.
Also note, some settings may vary from version to version.

vr2312 · ‎05-01-2017

@woodcock

You meant to say that maxVolumeDataSizeMB has precedence over (the potentially multiple, added together) maxTotalDataSizeMB ?

woodcock · ‎10-23-2017

It is kind of both so I reworded my answer.

Clarification - indexes.conf

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

Clarification - indexes.conf

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers