What is the relationship between Splunk accelerated data models stored in the datamodel_summary index and the normal indexed data (hot/warm/cold buckets) in terms of retention policy?
Specifically, when indexed data rolls to the frozen state and is deleted, is the corresponding data model summary also removed?
Does this mean that the data model summary cannot exceed the retention period of the original indexed data?
Long story short - DMA has its time range which makes it not store data for longer time than configured.
But
The DMA buckets are stored "alongside" normal index buckets. Actually DMA tsidx files are structurally the same as your normal term indexes and are treated (almost) the same as indexed files. So whenever your "base" index data bucket is getting rolled to frozen, the associated DMA bucket is deleted as well.
If we're talking about hot/warm -> cold rotation, it's more tricky than that.
DMA has its own path setting (tstatsPath if memory serves me right) in indexes.conf. By default it's set to a index-named directory on a volume called _splunk_summaries. This volume in turn by default is just $SPLUNK_DB. There are no additional warm/cold settings for DMA.
I'm not sure here (I haven't tested this) but I would expect that even if your hot/warm path is the same as your summaries volume, it would roll the data bucket to cold but leave the DMA bucket alone (still on hot/warm path).
I would recommend checking out "Where the Splunk platform creates and stores data model acceleration summaries" and "Configure size-based retention for data model acceleration summaries" (on the same page) which explains this in more detail and better than me copying and pasting into here!
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
thanks for your reply but i am confusing with this:
By default, Splunk software creates each data model acceleration summary on the indexer, parallel to the bucket or buckets that cover the range of time over which the summary spans, whether the buckets that fall within that range are hot, warm, or cold. If a bucket within the summary range moves to frozen status, Splunk software removes the summary information that corresponds with the bucket when it deletes or archives the data within the bucket.
Look at my senario bellow of understanding is correct or no :
Scenario: Volume 1 (Hot/Warm/DMA) and Volume 2 (Cold)
If you have configured your index to use Volume 1 for homePath and summaryHomePath, and Volume 2 for coldPath, here is how the data flows:
1. The Directory Structure
The indexer will split the data across your mount points like this:
Volume 1 (/mnt/fast_disk/):
Hot/Warm Buckets: Stores the rawdata and the datamodel_summary together.
Standalone Summary Directory: If a bucket moves to Cold (Volume 2), but your configuration tells Splunk to keep summaries on Volume 1, Splunk creates a mirrored directory structure on Volume 1 just to hold the .tsidx files.
Volume 2 (/mnt/cheap_disk/):
Cold Buckets: Stores only the rawdata (and standard index files like bloom filters).
2. What happens when data "Rolls"?
From Warm to Cold
When a bucket reaches the age or size limit to move to Cold:
The Raw Data moves from Volume 1 to Volume 2.
The DMA Summary behavior depends on your config:
Default: The summary moves with the bucket to Volume 2 (the cheap disk).
Optimized: If you set summaryHomePath specifically to Volume 1, the raw data moves to Volume 2, but the summary stays on Volume 1. This is a "Best Practice" because it keeps your accelerated searches running on your fastest storage even for older data.
From Cold to Frozen
When the bucket rolls to Frozen:
Original Data: Moved to your archive or deleted from Volume 2.
DMA Data: The software looks at wherever the summary was stored (Volume 1 or Volume 2) and deletes it immediately
Long story short - DMA has its time range which makes it not store data for longer time than configured.
But
The DMA buckets are stored "alongside" normal index buckets. Actually DMA tsidx files are structurally the same as your normal term indexes and are treated (almost) the same as indexed files. So whenever your "base" index data bucket is getting rolled to frozen, the associated DMA bucket is deleted as well.
If we're talking about hot/warm -> cold rotation, it's more tricky than that.
DMA has its own path setting (tstatsPath if memory serves me right) in indexes.conf. By default it's set to a index-named directory on a volume called _splunk_summaries. This volume in turn by default is just $SPLUNK_DB. There are no additional warm/cold settings for DMA.
I'm not sure here (I haven't tested this) but I would expect that even if your hot/warm path is the same as your summaries volume, it would roll the data bucket to cold but leave the DMA bucket alone (still on hot/warm path).
Many thanks
@livehybrid thanks for topic sharing but the bellow sentence confuse me :
Although data model acceleration summaries are unbounded in size by default, they are tied to raw data in your index buckets and age along with it. When summarized events pass out of cold buckets into frozen buckets, those events are removed from the related summaries.
is what I understand in the example below (adapted from GPT) correct?
Location:
$SPLUNK_HOME/var/lib/splunk/net_logs/db/
Example Structure:
db_1715001600_1715000000_101/
├── rawdata/
│ └── journal.gz # Contains the actual raw log data
├── 1715001600-1715000000-123456789.tsidx # Raw index file
└── Strings.xml # Metadata for string lookups
Location:
$SPLUNK_HOME/var/lib/splunk/net_logs/datamodel_summary/
Structure Explanation:
Example Structure:
DM_Project_Security_Network_Traffic/
└── db_1715001600_1715000000_101/ # Matches the original bucket ID
├── 123456789.tsidx # Accelerated (optimized) data
└── hash_info.xml # Metadata for accelerationSplunk creates a bucket:
bucket_101This file is stored in:
datamodel_summary/DM_Project.../db_..._101/Splunk deletes the raw data bucket:
$SPLUNK_HOME/var/lib/splunk/net_logs/db/db_..._101/It finds the matching bucket:
datamodel_summary/.../db_..._101/