Deployment Architecture

Explain to me like I'm 4, indexes.conf, the stanzas, change frozen to archive option

token2
Path Finder

Hello all, I'm finding the default indexer.conf settings too small, making various sourcetypes only searchable back about 4 months but I need a years worth/ability to search back to.

I've found numerous splunk posts on index.conf stanzas and settings, one more confusing than the next.

How the indexer stores indexes - Splunk Documentation

Configure index storage - Splunk Documentation

https://wiki.splunk.com/Deploy:BucketRotationAndRetention

I'm afraid I need a "explain to me like I'm 4 years old" post.  What calculator or tool to use, and for what stanzas to effectively:

A) get search visibility into logs older than a few months

B) no longer roll buckets into Frozen (which seems to be aka 'deleted') but into archived, to facility easily restoring them when A) isn't as dialed in as thought.

Labels (1)
0 Karma
1 Solution

somesoni2
Revered Legend

I will try

  • The data in Splunk stored in index (treat it like a database). An index can contain data for multiple sourcetypes (consider it like table).
  • The data for an index is stored on disk on "buckets" (this is actual disk directory where data is saved). 
  • Buckets have different stages- hot (when data is written into it), warm (data writing stops i.e. read-only, only searching happens, data is actively searched), cold (read-only, less frequently searched), frozen (read-only, retired, condensed)
  • Each bucket will have data for a range of timestamp (e.g. bucketA has data from 05/01/2022 01:01AM to 05/02/2022 08:09PM). The timestamp of oldest data is called age of bucket.
  • The retention for an index is either size based (total size of that index, set as attribute "maxTotalDataSizeMB") OR age based (timestamp of data in buckets, set as attribute "frozenTimePeriodInSecs").
  • If the total size of index has reached maxTotalDataSizeMB value, it'll start freezing oldest bucket (bucket with lowest timestamp). This will be checked first. The bucket will be deleted even if age of the bucket is within its retention period.
  • If the age of the bucket is lower than retention period (default is 6 year, set as  attribute "frozenTimePeriodInSecs"), it'll be frozen.
  • By default frozen buckets are deleted, but they can be moved to a specific directory (set as attribute "coldToFrozenDir") OR you can write a script which can do whatever you want to do with that frozen bucket (set as attribute "coldToFrozenScript").

 

So for each index you want to setup higher retention and don't want to delete frozen bucket, set following attributes

maxTotalDataSizeMBDetermines rolling behavior, cold to frozen. The maximum size of an index. When this limit is reached, cold buckets begin rolling to frozen.500000 (MB)
frozenTimePeriodInSecsDetermines rolling behavior, cold to frozen. Maximum age for a bucket, after which it rolls to frozen.

188697600 (in seconds; approx. 6 years)

coldToFrozenDirLocation for archived data. Determines behavior when a bucket rolls from cold to frozen. If set, the indexer will archive frozen buckets into this directory just before deleting them from the index.

If you don't set either this attribute or coldToFrozenScript, the indexer will just log the bucket's directory name and then delete it once it rolls to frozen.

OR  
coldToFrozenScriptScript to run just before a cold bucket rolls to frozen. If you set both this attribute and coldToFrozenDir, the indexer will use coldToFrozenDir and ignore this attribute.If you don't set either this attribute or coldToFrozenDir, the indexer will just log the bucket's directory name and then delete it once it rolls to frozen.

View solution in original post

somesoni2
Revered Legend

I will try

  • The data in Splunk stored in index (treat it like a database). An index can contain data for multiple sourcetypes (consider it like table).
  • The data for an index is stored on disk on "buckets" (this is actual disk directory where data is saved). 
  • Buckets have different stages- hot (when data is written into it), warm (data writing stops i.e. read-only, only searching happens, data is actively searched), cold (read-only, less frequently searched), frozen (read-only, retired, condensed)
  • Each bucket will have data for a range of timestamp (e.g. bucketA has data from 05/01/2022 01:01AM to 05/02/2022 08:09PM). The timestamp of oldest data is called age of bucket.
  • The retention for an index is either size based (total size of that index, set as attribute "maxTotalDataSizeMB") OR age based (timestamp of data in buckets, set as attribute "frozenTimePeriodInSecs").
  • If the total size of index has reached maxTotalDataSizeMB value, it'll start freezing oldest bucket (bucket with lowest timestamp). This will be checked first. The bucket will be deleted even if age of the bucket is within its retention period.
  • If the age of the bucket is lower than retention period (default is 6 year, set as  attribute "frozenTimePeriodInSecs"), it'll be frozen.
  • By default frozen buckets are deleted, but they can be moved to a specific directory (set as attribute "coldToFrozenDir") OR you can write a script which can do whatever you want to do with that frozen bucket (set as attribute "coldToFrozenScript").

 

So for each index you want to setup higher retention and don't want to delete frozen bucket, set following attributes

maxTotalDataSizeMBDetermines rolling behavior, cold to frozen. The maximum size of an index. When this limit is reached, cold buckets begin rolling to frozen.500000 (MB)
frozenTimePeriodInSecsDetermines rolling behavior, cold to frozen. Maximum age for a bucket, after which it rolls to frozen.

188697600 (in seconds; approx. 6 years)

coldToFrozenDirLocation for archived data. Determines behavior when a bucket rolls from cold to frozen. If set, the indexer will archive frozen buckets into this directory just before deleting them from the index.

If you don't set either this attribute or coldToFrozenScript, the indexer will just log the bucket's directory name and then delete it once it rolls to frozen.

OR  
coldToFrozenScriptScript to run just before a cold bucket rolls to frozen. If you set both this attribute and coldToFrozenDir, the indexer will use coldToFrozenDir and ignore this attribute.If you don't set either this attribute or coldToFrozenDir, the indexer will just log the bucket's directory name and then delete it once it rolls to frozen.

jencot01
Loves-to-Learn Lots

I have a follow-up question on this explanation...

If you have an index with this configuration:

[index]
homePath = volume:primary/index/db
coldPath = volume:primary/index/colddb
thawedPath = $SPLUNK_DB/index/thaweddb
tstatsHomePath = volume:primary/index/datamodel_summary
maxTotalDataSizeMB = 102400
frozenTimePeriodInSecs = 31536000 (one year)
coldToFrozenDir = /splunkdata/frozen/$_index_name

If the maxTotalDataSizeMB is reached before frozenTimePeriodInSecs, does the data get deleted without archiving first or does it get archived first since coldToFrozenDir is configured?

0 Karma

token2
Path Finder

You have a gift at breaking things down!

Get Updates on the Splunk Community!

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...

Explore the Latest Educational Offerings from Splunk (November Releases)

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...