Splunk Enterprise Security

Splunk ES || DMA(datamodel_summary) And normal index

Wohamed_wakkad
Engager

What is the relationship between Splunk accelerated data models stored in the datamodel_summary index and the normal indexed data (hot/warm/cold buckets) in terms of retention policy?


Specifically, when indexed data rolls to the frozen state and is deleted, is the corresponding data model summary also removed?


Does this mean that the data model summary cannot exceed the retention period of the original indexed data?

Labels (1)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @Wohamed_wakkad 

I would recommend checking out "Where the Splunk platform creates and stores data model acceleration summaries" and "Configure size-based retention for data model acceleration summaries" (on the same page) which explains this in more detail and better than me copying and pasting into here!

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

Wohamed_wakkad
Engager

@livehybrid 

thanks for your reply but i am confusing with this:

By default, Splunk software creates each data model acceleration summary on the indexer, parallel to the bucket or buckets that cover the range of time over which the summary spans, whether the buckets that fall within that range are hot, warm, or cold. If a bucket within the summary range moves to frozen status, Splunk software removes the summary information that corresponds with the bucket when it deletes or archives the data within the bucket.

Look at my senario bellow of understanding is correct or no :

Scenario: Volume 1 (Hot/Warm/DMA) and Volume 2 (Cold)
If you have configured your index to use Volume 1 for homePath and summaryHomePath, and Volume 2 for coldPath, here is how the data flows:

1. The Directory Structure
The indexer will split the data across your mount points like this:

Volume 1 (/mnt/fast_disk/):

Hot/Warm Buckets: Stores the rawdata and the datamodel_summary together.

Standalone Summary Directory: If a bucket moves to Cold (Volume 2), but your configuration tells Splunk to keep summaries on Volume 1, Splunk creates a mirrored directory structure on Volume 1 just to hold the .tsidx files.

Volume 2 (/mnt/cheap_disk/):

Cold Buckets: Stores only the rawdata (and standard index files like bloom filters).

2. What happens when data "Rolls"?
From Warm to Cold
When a bucket reaches the age or size limit to move to Cold:

The Raw Data moves from Volume 1 to Volume 2.

The DMA Summary behavior depends on your config:

Default: The summary moves with the bucket to Volume 2 (the cheap disk).

Optimized: If you set summaryHomePath specifically to Volume 1, the raw data moves to Volume 2, but the summary stays on Volume 1. This is a "Best Practice" because it keeps your accelerated searches running on your fastest storage even for older data.

From Cold to Frozen
When the bucket rolls to Frozen:

Original Data: Moved to your archive or deleted from Volume 2.

DMA Data: The software looks at wherever the summary was stored (Volume 1 or Volume 2) and deletes it immediately

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Long story short - DMA has its time range which makes it not store data for longer time than configured.

But

The DMA buckets are stored "alongside" normal index buckets. Actually DMA tsidx files are structurally the same as your normal term indexes and are treated (almost) the same as indexed files. So whenever your "base" index data bucket is getting rolled to frozen, the associated DMA bucket is deleted as well.

If we're talking about hot/warm -> cold rotation, it's more tricky than that.

DMA has its own path setting (tstatsPath if memory serves me right) in indexes.conf. By default it's set to a index-named directory on a volume called _splunk_summaries. This volume in turn by default is just $SPLUNK_DB. There are no additional warm/cold settings for DMA.

I'm not sure here (I haven't tested this) but I would expect that even if your hot/warm path is the same as your summaries volume, it would roll the data bucket to cold but leave the DMA bucket alone (still on hot/warm path).

 

Wohamed_wakkad
Engager

Many thanks

0 Karma

Wohamed_wakkad
Engager

  @livehybrid  thanks for topic sharing  but the bellow sentence confuse me :

 

Although data model acceleration summaries are unbounded in size by default, they are tied to raw data in your index buckets and age along with it. When summarized events pass out of cold buckets into frozen buckets, those events are removed from the related summaries.

is what I understand in the example below (adapted from GPT) correct?

📁 1. Raw Data (The Parent Bucket)

Location:

$SPLUNK_HOME/var/lib/splunk/net_logs/db/
 

 

 

Example Structure:

 

 
db_1715001600_1715000000_101/
├── rawdata/
│ └── journal.gz # Contains the actual raw log data
├── 1715001600-1715000000-123456789.tsidx # Raw index file
└── Strings.xml # Metadata for string lookups
 

 

 

🧠 2. Data Model Acceleration (The Shadow Copy)

Location:

 

 
$SPLUNK_HOME/var/lib/splunk/net_logs/datamodel_summary/
 

 

 

Structure Explanation:

  • Splunk creates a folder named after the Data Model ID
  • Inside it, bucket directories mirror the original index buckets

    Example Structure:

     

     
    DM_Project_Security_Network_Traffic/
    └── db_1715001600_1715000000_101/ # Matches the original bucket ID
    ├── 123456789.tsidx # Accelerated (optimized) data
    └── hash_info.xml # Metadata for acceleration
     
     

    🔄3. Lifecycle Demonstration (Step-by-Step)

    Step 1: Ingestion & Acceleration

    • You ingest  firewall logs
    • Splunk creates a bucket:

       
      bucket_101
       
    • Since Data Model Acceleration is enabled:
      • The Background Summarization Processor scans this bucket
      • It generates a smaller, optimized .tsidx file
      • This file is stored in:

         
        datamodel_summary/DM_Project.../db_..._101/
         

        Step 2: Aging & Deletion (Retention Policy Enforcement)

        • Your retention policy is set to 30 days
        • Once bucket_101 becomes 31 days old:
          • Splunk marks it as expired

            What happens next:

            1. Bucket Removal
              • Splunk deletes the raw data bucket:

                 
                $SPLUNK_HOME/var/lib/splunk/net_logs/db/db_..._101/
                 
              • Automatic Cleanup of Accelerated Data
                • Splunk checks the corresponding Data Model Summary path
                • It finds the matching bucket:

                   
                  datamodel_summary/.../db_..._101/
                   
                • This directory is automatically deleted

                  Key Takeaways

                  • Data Model Acceleration creates a dependent “shadow copy” of indexed data
                  • Accelerated data is tightly coupled to its original bucket
                  • When the original bucket is deleted (due to retention policies), the accelerated summary is also removed automatically
                     
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...