Knowledge Management

How to deal with datamodel retention period as summary range is not working

thambisetty_bal
Path Finder

Hi Splunkers,

I have been using Splunk Enterprise Security. I have Network_Traffic datamodel running in my environment and summary range has been set "1 month". I can see more than one month difference when I subtract earliest(_time) from latest(_time). Below is the query how I found summary range explicitly.
| tstats summariesonly=true earliest(_time) as etime latest(_time) as ltime from datamodel=Network_Traffic | convert ctime(*time)

I could see summary range more than two months though it is set 1 month.

please help me how to set strict retention to avoid disk space issues.

Big Thanks in advance

thambisetty_bal
Path Finder

Thanks Iguinn for your answer.

As far as I know, datamodel_summary is part of Index and datamodel_summary range can be set using acceleration.earliest_time=x and summaries will be created only for x period even Index retention period(frozeTimeInSeconds) is x+y.

Are datamodel_summary and Index retention period dependent? if yes, how to set strict policy to datamodel can only store events for x period.

lguinn2
Legend

Here is the deal - the retention setting on the index applies to buckets, not to individual events. A bucket cannot be removed from disk until all the events within that bucket are expired. So if you set retention to one month and your bucket is large and holds 3 months of data, then you will definitely have more than one month of data in your index.

You can only set strict retention rules in one of two ways: (1) 1 bucket = 1 hour of data, or, (2) 1 bucket = 1 day of data.
If you must, you can do this, but it will tend to make many small buckets (unless your daily volume is very high for the affected indexes). Many small buckets will cause your searches to run more slowly. I would avoid using strict retention to address your problem.

In order to get a balance between disk space management and search efficiency, you might want to set the bucket size for your indexes. Do this in addition to your retention setting. For each index, figure out how much disk space is consumed per day of data. Also consider a typical search range - you don't want to create too many buckets. In general, I personally try to follow these rules:
- size buckets to hold 24-48 hours of data, with the following exceptions:
- do not make buckets smaller than "auto," which is 750 MB
- do not make buckets larger than "auto_high_volume," which is 10GB
Of course, sometimes other factors come into play, like how often you want to back up the environment. And sometimes I will go a bit smaller. But these are good general starting points. Also, the bucket size that you set for the index is the approximate maximum bucket size; buckets can be smaller for a variety of reasons.

If you use the dbinspect command, it will show you a lot of cool information about the buckets in your index.

Finally, datamodel acceleration summaries also take space. So you might want to look at the space consumed by these. Changing the summarization options for the datamodel could lower the disk space (although this is usually less disk than the indexes themselves.)

HTH

asimagu
Builder

@lguinn does this apply to datamodels retention too??

0 Karma

lguinn2
Legend

When you set data model acceleration, you are choosing the time range for acceleration. This timerange defines the "retention" for the acceleration data. It is independent of the index retention settings. But clearly, you can't retain the acceleration data longer than the index data!

0 Karma
Get Updates on the Splunk Community!

Splunk Classroom Chronicles: Training Tales and Testimonials

Welcome to the "Splunk Classroom Chronicles" series, created to help curious, career-minded learners get ...

Access Tokens Page - New & Improved

Splunk Observability Cloud recently launched an improved design for the access tokens page for better ...

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

🍂 Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...