topic Re: data model storage and backups in Reporting

data model storage and backups

jeff — Sat, 03 May 2014 15:30:57 GMT

Question from my backup guys and I couldn't find a good answer in the docs- I don't understand the structure of the data model data on the system. Indexes with a data model defined have a datamodel_summary directory:

[splunk@splunk3 splunk]$ ll ./firewall
total 60
drwx------.  37 splunk splunk  4096 May  2 13:26 colddb
drwx------. 340 splunk splunk 24576 May  2 13:35 datamodel_summary
drwx------. 306 splunk splunk 20480 May  3 10:06 db
drwx------.   2 splunk splunk  4096 Aug 17  2013 thaweddb

In the _internaldb index directory, I seem to have one of these and another "summary" directory that looks like it's associated somehow with the splunk deployment monitor:

[splunk@splunk3 splunk]$ ll _internaldb/
total 532
drwx------. 2216 splunk splunk 126976 May  3 09:47 colddb
drwx------. 2519 splunk splunk 184320 May  3 09:55 datamodel_summary
drwx------.  306 splunk splunk  28672 May  3 10:08 db
drwx------. 2519 splunk splunk 184320 May  3 09:50 summary
drwx------.    2 splunk splunk   4096 Aug 16  2013 thaweddb

[splunk@splunk3 splunk]$ ll _internaldb/summary/998_163BFC27-2C4C-4CDE-83CD-F8B48C29BA80/20D17CF6-2E61-47A1-B3A4-FF57509916DF/
total 596
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_1a56f43bf8d5bf20
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_26e747c470c62ba8
<snip several lines />
drwx------. 2 splunk splunk 24576 Jan 11 14:08 splunk_deployment_monitor_nobody_NSd0dc3ea132443bbf

From the backup perspective, the backups are throwing a thousands of errors each night for non-existant files (were there when the drive was scanned, but not when it came time to back up). I'm fairly sure it's okay to tell them to exclude the datamodel_summary (and summary) directories entirely since they can be recreated after a restore, but for my own sanity I'd like to understand the structure a bit more.

Can we exclude the data models from backup?
What is that extra summary directory in _internaldb all about? Likewise, it can be excluded?

Re: data model storage and backups

martin_mueller — Sat, 03 May 2014 15:45:05 GMT

I think the summary directory is related to report acceleration turned on for a search owned by nobody in the splunk_deployment_monitor app... I also think those two kinds of accerelations don't need to be backed up because they don't contain anything unique but rather only summaries of existing index data.

Re: data model storage and backups

lmyrefelt — Mon, 28 Sep 2020 16:32:20 GMT

The summary directory you see are for summary databases , this one seems to be generated by the deployment monitor app.

Your tsdix files should go in the data model_summary dir if you do not tell them otherwise (in / via indexes.conf , look for tsidx_homepath or similar)

By default summary data should go to $SPLUNK_HOME/var/lib/splunk/database/summary

Re: data model storage and backups

lmyrefelt — Mon, 05 May 2014 20:29:33 GMT

indexes.conf - tstatsHomePath for datamodels
indexes.conf - tsidxStatsHomePath for accelerations
indexes.conf - summaryHomePath for summary data

Re: data model storage and backups

lmyrefelt — Mon, 05 May 2014 20:32:01 GMT

as backup, the data should be generated when you first run /using the data models in the pivot if i don't remember wrong, so there should not be any point in making backups of them. If you create your own data models for your data, you should take a backup of the data model configuration.

Re: data model storage and backups

helge — Sun, 09 Aug 2015 22:35:45 GMT

Exclude the datamodel_summary directories from backup.
If you restore an index, Splunk recreates the accelerated data model (that is what is stored in datamodel_summary) automatically.