Reporting

data model storage and backups

jeff
Contributor

Question from my backup guys and I couldn't find a good answer in the docs- I don't understand the structure of the data model data on the system. Indexes with a data model defined have a datamodel_summary directory:

[splunk@splunk3 splunk]$ ll ./firewall
total 60
drwx------.  37 splunk splunk  4096 May  2 13:26 colddb
drwx------. 340 splunk splunk 24576 May  2 13:35 datamodel_summary
drwx------. 306 splunk splunk 20480 May  3 10:06 db
drwx------.   2 splunk splunk  4096 Aug 17  2013 thaweddb

In the _internaldb index directory, I seem to have one of these and another "summary" directory that looks like it's associated somehow with the splunk deployment monitor:

[splunk@splunk3 splunk]$ ll _internaldb/
total 532
drwx------. 2216 splunk splunk 126976 May  3 09:47 colddb
drwx------. 2519 splunk splunk 184320 May  3 09:55 datamodel_summary
drwx------.  306 splunk splunk  28672 May  3 10:08 db
drwx------. 2519 splunk splunk 184320 May  3 09:50 summary
drwx------.    2 splunk splunk   4096 Aug 16  2013 thaweddb

[splunk@splunk3 splunk]$ ll _internaldb/summary/998_163BFC27-2C4C-4CDE-83CD-F8B48C29BA80/20D17CF6-2E61-47A1-B3A4-FF57509916DF/
total 596
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_1a56f43bf8d5bf20
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_26e747c470c62ba8
<snip several lines />
drwx------. 2 splunk splunk 24576 Jan 11 14:08 splunk_deployment_monitor_nobody_NSd0dc3ea132443bbf

From the backup perspective, the backups are throwing a thousands of errors each night for non-existant files (were there when the drive was scanned, but not when it came time to back up). I'm fairly sure it's okay to tell them to exclude the datamodel_summary (and summary) directories entirely since they can be recreated after a restore, but for my own sanity I'd like to understand the structure a bit more.

  1. Can we exclude the data models from backup?
  2. What is that extra summary directory in _internaldb all about? Likewise, it can be excluded?
0 Karma

helge
Builder

Exclude the datamodel_summary directories from backup.
If you restore an index, Splunk recreates the accelerated data model (that is what is stored in datamodel_summary) automatically.

0 Karma

lmyrefelt
Builder

The summary directory you see are for summary databases , this one seems to be generated by the deployment monitor app.

Your tsdix files should go in the data model_summary dir if you do not tell them otherwise (in / via indexes.conf , look for tsidx_homepath or similar)

By default summary data should go to $SPLUNK_HOME/var/lib/splunk/database/summary

0 Karma

lmyrefelt
Builder
  1. as backup, the data should be generated when you first run /using the data models in the pivot if i don't remember wrong, so there should not be any point in making backups of them. If you create your own data models for your data, you should take a backup of the data model configuration.
0 Karma

lmyrefelt
Builder

indexes.conf - tstatsHomePath for datamodels
indexes.conf - tsidxStatsHomePath for accelerations
indexes.conf - summaryHomePath for summary data

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I think the summary directory is related to report acceleration turned on for a search owned by nobody in the splunk_deployment_monitor app... I also think those two kinds of accerelations don't need to be backed up because they don't contain anything unique but rather only summaries of existing index data.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...