Solved: Explain to me like I'm 4, indexes.conf, the stanza...

token2 · ‎05-19-2022

Hello all, I'm finding the default indexer.conf settings too small, making various sourcetypes only searchable back about 4 months but I need a years worth/ability to search back to.

I've found numerous splunk posts on index.conf stanzas and settings, one more confusing than the next.

How the indexer stores indexes - Splunk Documentation

Configure index storage - Splunk Documentation

https://wiki.splunk.com/Deploy:BucketRotationAndRetention

I'm afraid I need a "explain to me like I'm 4 years old" post. What calculator or tool to use, and for what stanzas to effectively:

A) get search visibility into logs older than a few months

B) no longer roll buckets into Frozen (which seems to be aka 'deleted') but into archived, to facility easily restoring them when A) isn't as dialed in as thought.

somesoni2 · ‎05-19-2022

I will try

The data in Splunk stored in index (treat it like a database). An index can contain data for multiple sourcetypes (consider it like table).
The data for an index is stored on disk on "buckets" (this is actual disk directory where data is saved).
Buckets have different stages- hot (when data is written into it), warm (data writing stops i.e. read-only, only searching happens, data is actively searched), cold (read-only, less frequently searched), frozen (read-only, retired, condensed)
Each bucket will have data for a range of timestamp (e.g. bucketA has data from 05/01/2022 01:01AM to 05/02/2022 08:09PM). The timestamp of oldest data is called age of bucket.
The retention for an index is either size based (total size of that index, set as attribute "maxTotalDataSizeMB") OR age based (timestamp of data in buckets, set as attribute "frozenTimePeriodInSecs").
If the total size of index has reached maxTotalDataSizeMB value, it'll start freezing oldest bucket (bucket with lowest timestamp). This will be checked first. The bucket will be deleted even if age of the bucket is within its retention period.
If the age of the bucket is lower than retention period (default is 6 year, set as attribute "frozenTimePeriodInSecs"), it'll be frozen.
By default frozen buckets are deleted, but they can be moved to a specific directory (set as attribute "coldToFrozenDir") OR you can write a script which can do whatever you want to do with that frozen bucket (set as attribute "coldToFrozenScript").

So for each index you want to setup higher retention and don't want to delete frozen bucket, set following attributes

maxTotalDataSizeMB

Determines rolling behavior, cold to frozen. The maximum size of an index. When this limit is reached, cold buckets begin rolling to frozen.

500000 (MB)

frozenTimePeriodInSecs	Determines rolling behavior, cold to frozen. Maximum age for a bucket, after which it rolls to frozen.	188697600 (in seconds; approx. 6 years)
coldToFrozenDir	Location for archived data. Determines behavior when a bucket rolls from cold to frozen. If set, the indexer will archive frozen buckets into this directory just before deleting them from the index.	If you don't set either this attribute or coldToFrozenScript, the indexer will just log the bucket's directory name and then delete it once it rolls to frozen.
OR

coldToFrozenScript

Script to run just before a cold bucket rolls to frozen. If you set both this attribute and coldToFrozenDir, the indexer will use coldToFrozenDir and ignore this attribute.

If you don't set either this attribute or coldToFrozenDir, the indexer will just log the bucket's directory name and then delete it once it rolls to frozen.

View solution in original post

somesoni2 · ‎05-19-2022

I will try

The data in Splunk stored in index (treat it like a database). An index can contain data for multiple sourcetypes (consider it like table).
The data for an index is stored on disk on "buckets" (this is actual disk directory where data is saved).
Buckets have different stages- hot (when data is written into it), warm (data writing stops i.e. read-only, only searching happens, data is actively searched), cold (read-only, less frequently searched), frozen (read-only, retired, condensed)
Each bucket will have data for a range of timestamp (e.g. bucketA has data from 05/01/2022 01:01AM to 05/02/2022 08:09PM). The timestamp of oldest data is called age of bucket.
The retention for an index is either size based (total size of that index, set as attribute "maxTotalDataSizeMB") OR age based (timestamp of data in buckets, set as attribute "frozenTimePeriodInSecs").
If the total size of index has reached maxTotalDataSizeMB value, it'll start freezing oldest bucket (bucket with lowest timestamp). This will be checked first. The bucket will be deleted even if age of the bucket is within its retention period.
If the age of the bucket is lower than retention period (default is 6 year, set as attribute "frozenTimePeriodInSecs"), it'll be frozen.
By default frozen buckets are deleted, but they can be moved to a specific directory (set as attribute "coldToFrozenDir") OR you can write a script which can do whatever you want to do with that frozen bucket (set as attribute "coldToFrozenScript").

So for each index you want to setup higher retention and don't want to delete frozen bucket, set following attributes

maxTotalDataSizeMB

Determines rolling behavior, cold to frozen. The maximum size of an index. When this limit is reached, cold buckets begin rolling to frozen.

500000 (MB)

frozenTimePeriodInSecs	Determines rolling behavior, cold to frozen. Maximum age for a bucket, after which it rolls to frozen.	188697600 (in seconds; approx. 6 years)
coldToFrozenDir	Location for archived data. Determines behavior when a bucket rolls from cold to frozen. If set, the indexer will archive frozen buckets into this directory just before deleting them from the index.	If you don't set either this attribute or coldToFrozenScript, the indexer will just log the bucket's directory name and then delete it once it rolls to frozen.
OR

coldToFrozenScript

Script to run just before a cold bucket rolls to frozen. If you set both this attribute and coldToFrozenDir, the indexer will use coldToFrozenDir and ignore this attribute.

If you don't set either this attribute or coldToFrozenDir, the indexer will just log the bucket's directory name and then delete it once it rolls to frozen.

jencot01 · ‎05-24-2023

I have a follow-up question on this explanation...

If you have an index with this configuration:

[index]
homePath = volume:primary/index/db
coldPath = volume:primary/index/colddb
thawedPath = $SPLUNK_DB/index/thaweddb
tstatsHomePath = volume:primary/index/datamodel_summary
maxTotalDataSizeMB = 102400
frozenTimePeriodInSecs = 31536000 (one year)
coldToFrozenDir = /splunkdata/frozen/$_index_name

If the maxTotalDataSizeMB is reached before frozenTimePeriodInSecs, does the data get deleted without archiving first or does it get archived first since coldToFrozenDir is configured?

token2 · ‎05-19-2022

You have a gift at breaking things down!

Explain to me like I'm 4, indexes.conf, the stanzas, change frozen to archive option

indexer

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Are you a member of the Splunk Community?

Explain to me like I'm 4, indexes.conf, the stanzas, change frozen to archive option

indexer

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor