Solved: Re: Indexes tiering age-based (not depend from siz...

bizza · ‎12-20-2011

Hi,
there is a way to roll data from hot to warm, and from warm to cold, per index, using the data's age?

I found only

frozenTimePeriodInSecs

for the age, other parameters are based on size.

Regards

bizza

bwooden · ‎12-20-2011

There is a way to do this - but I would not encourage its use as it may unintentionally impact search performance. It is typically more performant to tell Splunk how much storage it may use per volume (or per index where different types of data have different retention requirements). Splunk does a good job figuring out what data to put in which bucket based on time. It is not usually beneficial to purge data in an index based on its age because we must first force Splunk to bucket this data based on our calculations (which are not usually optimal).

Further attempt at persuasion: It is generally okay to have more data than required. If only 6 months of data is needed but 9 months is available, Splunk will still return data quickly based on its underlying time series index. I say this to further discourage anyone from aging by time without careful consideration and planning.

If an index must be manipulated to discard data by age to meet a requirement, the maxHotIdleSecs setting would be used. Let us say a business rule demands we drop data older than 2 months and your maxHotBuckets is set to 1. First, set maxHotIdleSecs to one day. Important: In this example, ensure only one bucket per day by size (max) is created or less than 2 months of data will be retained. Next, set maxWarmDBCount to 59. This configuration will create a new hot bucket each day and keep 59 warm buckets (each ostensibly having a day's data). Splunk will then roll data older than 60 days to cold. Now set frozenTimePeriodInSecs to 60 days so that data rolled to cold is frozen. Note: While these settings were described in days but are represented as seconds in the .conf

NB: While creating a new index with this configuration is one thing, applying these settings to an existing index is something more serious. Please be careful if considering these settings for a production environment. Consulting Support before implementing an impactful configuration is strongly encouraged.

View solution in original post

bwooden · ‎12-20-2011

There is a way to do this - but I would not encourage its use as it may unintentionally impact search performance. It is typically more performant to tell Splunk how much storage it may use per volume (or per index where different types of data have different retention requirements). Splunk does a good job figuring out what data to put in which bucket based on time. It is not usually beneficial to purge data in an index based on its age because we must first force Splunk to bucket this data based on our calculations (which are not usually optimal).

Further attempt at persuasion: It is generally okay to have more data than required. If only 6 months of data is needed but 9 months is available, Splunk will still return data quickly based on its underlying time series index. I say this to further discourage anyone from aging by time without careful consideration and planning.

If an index must be manipulated to discard data by age to meet a requirement, the maxHotIdleSecs setting would be used. Let us say a business rule demands we drop data older than 2 months and your maxHotBuckets is set to 1. First, set maxHotIdleSecs to one day. Important: In this example, ensure only one bucket per day by size (max) is created or less than 2 months of data will be retained. Next, set maxWarmDBCount to 59. This configuration will create a new hot bucket each day and keep 59 warm buckets (each ostensibly having a day's data). Splunk will then roll data older than 60 days to cold. Now set frozenTimePeriodInSecs to 60 days so that data rolled to cold is frozen. Note: While these settings were described in days but are represented as seconds in the .conf

NB: While creating a new index with this configuration is one thing, applying these settings to an existing index is something more serious. Please be careful if considering these settings for a production environment. Consulting Support before implementing an impactful configuration is strongly encouraged.

bizza · ‎12-20-2011

thank you bwooden, I'll plan a tiering based on data.

Indexes tiering age-based (not depend from size)

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

Indexes tiering age-based (not depend from size)

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...