Splunk Search

Data Trim

Siddharthnegi
Contributor

I have some questions regarding data trim.

From which version  data trim has been added?

What is the parameter  to trim the data like how much storage used be filled in order  to do the data trim?

Can we stop data trim? or how can we know that data is about to get trim

 

 

0 Karma

Siddharthnegi
Contributor

 I am talking about when you have given particular size for your index and a retention period . so if the data overloads your storage size then Splunk intelligence starts trimming old data from cold bucket so that you have storage for your new data.

hope this explanation helps.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi,

yes, I completely misunderstood your question!

Anyway, when a bucket exceed the retention time  and you didn't configured a string to offline save it, it is discarded, but only when the earliest event exceeds te retention period.

For this reason, you can have events that exceed the retention period because they are in the same bucket containing events not exceeding the retention period.

For this reason it's a best practice store in the same index events with almost the same ingestion frequency.

Buckets are also discarded when your index reaches the max size, when this occurs, the older bucket is discarded, only one by one, until the Index again reaches the max size.

Ciao.

Giuseppe

Siddharthnegi
Contributor

I have some questions regarding data trimming

like from which version of splunk this feature is added.

Tags (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi,

I work on Splunk from version 4 and it was always present, I cannot answer for the previous versions.

Ciao.

Giuseppe

Siddharthnegi
Contributor

ok , and when does it trim the data like how much storage have to be filled in order for splunk to trim old data,

is there any parameters for trimmimg?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi ,

you can define:

  • the Index retention using the "frozenTimePeriodInSecs" parameter
  • the index max dimension using the "maxTotalDataSizeMB "parameter.

Here you can find useful information https://www.splunk.com/en_us/blog/tips-and-tricks/managing-index-sizes-in-splunk.html?locale=en_us

Let me know if you need more help, otherwise, please accept the answer for the other people of Community.

Ciao.

Giuseppe

P.S.: Karma Points are appreciated 😉

Siddharthnegi
Contributor

 i  am asking when splunk trim the old data from cold bucket,  when does it do that? like how much data have to be filled in index for splunk to trim that.

 

let i have a index to which 500gb is allocated . now when splunk trims the data , how much of 500gb should be filled in order fro splunk to do that.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi,

as I said, trimming for Index size exceeding occurs when the index reach the max size and the oldest bucket is deleted.

In this way the Index has a different dimension so it continues to grow until it again reaches the max size, so it deletes again the oldest buck and so on.

by default, In average a bucket has a dimension of 10 GB, so this should be the trimmed size.

about the question when: trimmig is performed when the max size is reached.

Ciao.

Giuseppe

0 Karma

Siddharthnegi
Contributor

so if only 10gb of storage is remaining of index then splunk starts the trimming of old data?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi,

no, trimming starts when the Index reaches the max size.

After trimming, the index probably will have 490 GB, so it continues to grow until it reaches again the max size, so the trimming process restart.

ciao.

Giuseppe

0 Karma

Siddharthnegi
Contributor

so it trim 10gb data everytime storage is filled .

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi,

yes: Splunk trims the oldest bucket, that usually (by default) had a dimension of 10 GB.

Ciao.

Giuseppe

0 Karma

Siddharthnegi
Contributor

Thanks for your answer, however, we are facing an issue where there is enough space in our index but our disk space has reached around 80%. SO I just want to know if volume trimming happens on the disk level as well ? Below attached are our index configuration for paloalto index and the disk status.

 

[firewall_paloalto]
coldPath = volume:cold\firewall_paloalto\colddb
homePath = volume:hotwarm\firewall_paloalto\db
thawedPath = D:\splunk_data\firewall_paloalto\thaweddb
tstatsHomePath = volume:hotwarm\firewall_paloalto\datamodel_summary

frozenTimePeriodInSecs = 47304000

maxTotalDataSizeMB = 4294967295

0 Karma

Siddharthnegi
Contributor

can we increase this 10gb margin for data trim and can we know before that  splunk is about to trim the data so that we would know that data is going to be trimmed..

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi,

I don't like to change this kind of parameters, also because you only move the problem, you don't solve it also giving a max size of 20 GB, when trimming you have 20 GB od disk space, what is changed?

in other words, what's the problem?

Splunk automatically trims the oldest bucket when it reaches the max size,

In my opinion, the most important aspect to analyze is:

is the max index size trimming approach compatible with your retention policy?

in other words, if you need to retain events for 90 days, maybe (this must be checked), are events in this period trimmed?

because there's the risk to trim events in the retention period.

I usually don't use the max size approach for trimming but only the retention period to avoid to trim events that I need.

Ciao.

Giuseppe

Siddharthnegi
Contributor

Ok i understood,

but still can we change it ?

also how to prevent it or to know before it happens

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi,

as I said, you have to prevently design your storage defining for each index the maximum storage required, so 

if for an index you are ingesting 70 GB/day and you want a retention of 90 days, you need of 

70 * 0.5 * 90 = 3150 GB available

and giving a margin of 10% you neew around 3.5TB of disk space.

making the same design for each index you have your storage requirements.

You could also see the average of license consuption and use only that value for the calculation.

Ciao.

Giuseppe

 

Siddharthnegi
Contributor

Thanks for your answer, however, we are facing an issue where there is enough space in our index but our disk space has reached around 80%. SO I just want to know if volume trimming happens on the disk level as well ? Below attached are our index configuration for paloalto index and the disk status.

 

[firewall_paloalto]
coldPath = volume:cold\firewall_paloalto\colddb
homePath = volume:hotwarm\firewall_paloalto\db
thawedPath = D:\splunk_data\firewall_paloalto\thaweddb
tstatsHomePath = volume:hotwarm\firewall_paloalto\datamodel_summary

frozenTimePeriodInSecs = 47304000

maxTotalDataSizeMB = 4294967295

 

 

Siddharthnegi_0-1691387871594.png

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Siddharthnegi,

are you speaking of data truncating, to limit the lenght of too long events or the full events filtering and deletion?

if the data truncating, you can use the TRUNCATE = 1000 (default is 10.000) in your props.conf (for more infos see at https://docs.splunk.com/Documentation/Splunk/9.1.0/Admin/Propsconf ), for my knowledge it is in Splunk from the first releases.

if you're speaking of event filtering, see at https://docs.splunk.com/Documentation/Splunk/9.1.0/Forwarding/Routeandfilterdatad

Ciao.

Giuseppe

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...