Splunk ES || DMA(datamodel_summary) And normal ind...

Wohamed_wakkad

What is the relationship between Splunk accelerated data models stored in the datamodel_summary index and the normal indexed data (hot/warm/cold buckets) in terms of retention policy?

Specifically, when indexed data rolls to the frozen state and is deleted, is the corresponding data model summary also removed?

Does this mean that the data model summary cannot exceed the retention period of the original indexed data?

livehybrid

Hi @Wohamed_wakkad

I would recommend checking out "Where the Splunk platform creates and stores data model acceleration summaries" and "Configure size-based retention for data model acceleration summaries" (on the same page) which explains this in more detail and better than me copying and pasting into here!

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Wohamed_wakkad

@livehybrid

thanks for your reply but i am confusing with this:

By default, Splunk software creates each data model acceleration summary on the indexer, parallel to the bucket or buckets that cover the range of time over which the summary spans, whether the buckets that fall within that range are hot, warm, or cold. If a bucket within the summary range moves to frozen status, Splunk software removes the summary information that corresponds with the bucket when it deletes or archives the data within the bucket.

Look at my senario bellow of understanding is correct or no :

Scenario: Volume 1 (Hot/Warm/DMA) and Volume 2 (Cold)
If you have configured your index to use Volume 1 for homePath and summaryHomePath, and Volume 2 for coldPath, here is how the data flows:

1. The Directory Structure
The indexer will split the data across your mount points like this:

Volume 1 (/mnt/fast_disk/):

Hot/Warm Buckets: Stores the rawdata and the datamodel_summary together.

Standalone Summary Directory: If a bucket moves to Cold (Volume 2), but your configuration tells Splunk to keep summaries on Volume 1, Splunk creates a mirrored directory structure on Volume 1 just to hold the .tsidx files.

Volume 2 (/mnt/cheap_disk/):

Cold Buckets: Stores only the rawdata (and standard index files like bloom filters).

2. What happens when data "Rolls"?
From Warm to Cold
When a bucket reaches the age or size limit to move to Cold:

The Raw Data moves from Volume 1 to Volume 2.

The DMA Summary behavior depends on your config:

Default: The summary moves with the bucket to Volume 2 (the cheap disk).

Optimized: If you set summaryHomePath specifically to Volume 1, the raw data moves to Volume 2, but the summary stays on Volume 1. This is a "Best Practice" because it keeps your accelerated searches running on your fastest storage even for older data.

From Cold to Frozen
When the bucket rolls to Frozen:

Original Data: Moved to your archive or deleted from Volume 2.

DMA Data: The software looks at wherever the summary was stored (Volume 1 or Volume 2) and deletes it immediately

PickleRick

Long story short - DMA has its time range which makes it not store data for longer time than configured.

But

The DMA buckets are stored "alongside" normal index buckets. Actually DMA tsidx files are structurally the same as your normal term indexes and are treated (almost) the same as indexed files. So whenever your "base" index data bucket is getting rolled to frozen, the associated DMA bucket is deleted as well.

If we're talking about hot/warm -> cold rotation, it's more tricky than that.

DMA has its own path setting (tstatsPath if memory serves me right) in indexes.conf. By default it's set to a index-named directory on a volume called _splunk_summaries. This volume in turn by default is just $SPLUNK_DB. There are no additional warm/cold settings for DMA.

I'm not sure here (I haven't tested this) but I would expect that even if your hot/warm path is the same as your summaries volume, it would roll the data bucket to cold but leave the DMA bucket alone (still on hot/warm path).

Wohamed_wakkad

Many thanks

Wohamed_wakkad

@livehybrid thanks for topic sharing but the bellow sentence confuse me :

Although data model acceleration summaries are unbounded in size by default, they are tied to raw data in your index buckets and age along with it. When summarized events pass out of cold buckets into frozen buckets, those events are removed from the related summaries.

is what I understand in the example below (adapted from GPT) correct?

📁 1. Raw Data (The Parent Bucket)

Location:

$SPLUNK_HOME/var/lib/splunk/net_logs/db/

Example Structure:

db_1715001600_1715000000_101/
├── rawdata/
│   └── journal.gz          # Contains the actual raw log data
├── 1715001600-1715000000-123456789.tsidx   # Raw index file
└── Strings.xml             # Metadata for string lookups

🧠 2. Data Model Acceleration (The Shadow Copy)

Location:

$SPLUNK_HOME/var/lib/splunk/net_logs/datamodel_summary/

Structure Explanation:

Splunk creates a folder named after the Data Model ID
Inside it, bucket directories mirror the original index buckets
Example Structure:
DM_Project_Security_Network_Traffic/
└── db_1715001600_1715000000_101/ # Matches the original bucket ID
├── 123456789.tsidx # Accelerated (optimized) data
└── hash_info.xml # Metadata for acceleration

🔄3. Lifecycle Demonstration (Step-by-Step)
Step 1: Ingestion & Acceleration
You ingest firewall logs
Splunk creates a bucket:

bucket_101

Since Data Model Acceleration is enabled:
The Background Summarization Processor scans this bucket
It generates a smaller, optimized .tsidx file
This file is stored in:

datamodel_summary/DM_Project.../db_..._101/

Step 2: Aging & Deletion (Retention Policy Enforcement)
Your retention policy is set to 30 days
Once bucket_101 becomes 31 days old:
Splunk marks it as expired
What happens next:
Bucket Removal
Splunk deletes the raw data bucket:

$SPLUNK_HOME/var/lib/splunk/net_logs/db/db_..._101/

Automatic Cleanup of Accelerated Data
Splunk checks the corresponding Data Model Summary path
It finds the matching bucket:

datamodel_summary/.../db_..._101/

This directory is automatically deleted
✅Key Takeaways
Data Model Acceleration creates a dependent “shadow copy” of indexed data
Accelerated data is tightly coupled to its original bucket
When the original bucket is deleted (due to retention policies), the accelerated summary is also removed automatically