Reporting
Highlighted

data model storage and backups

Contributor

Question from my backup guys and I couldn't find a good answer in the docs- I don't understand the structure of the data model data on the system. Indexes with a data model defined have a datamodel_summary directory:

[splunk@splunk3 splunk]$ ll ./firewall
total 60
drwx------.  37 splunk splunk  4096 May  2 13:26 colddb
drwx------. 340 splunk splunk 24576 May  2 13:35 datamodel_summary
drwx------. 306 splunk splunk 20480 May  3 10:06 db
drwx------.   2 splunk splunk  4096 Aug 17  2013 thaweddb

In the _internaldb index directory, I seem to have one of these and another "summary" directory that looks like it's associated somehow with the splunk deployment monitor:

[splunk@splunk3 splunk]$ ll _internaldb/
total 532
drwx------. 2216 splunk splunk 126976 May  3 09:47 colddb
drwx------. 2519 splunk splunk 184320 May  3 09:55 datamodel_summary
drwx------.  306 splunk splunk  28672 May  3 10:08 db
drwx------. 2519 splunk splunk 184320 May  3 09:50 summary
drwx------.    2 splunk splunk   4096 Aug 16  2013 thaweddb

[splunk@splunk3 splunk]$ ll _internaldb/summary/998_163BFC27-2C4C-4CDE-83CD-F8B48C29BA80/20D17CF6-2E61-47A1-B3A4-FF57509916DF/
total 596
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_1a56f43bf8d5bf20
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_26e747c470c62ba8
<snip several lines />
drwx------. 2 splunk splunk 24576 Jan 11 14:08 splunk_deployment_monitor_nobody_NSd0dc3ea132443bbf

From the backup perspective, the backups are throwing a thousands of errors each night for non-existant files (were there when the drive was scanned, but not when it came time to back up). I'm fairly sure it's okay to tell them to exclude the datamodel_summary (and summary) directories entirely since they can be recreated after a restore, but for my own sanity I'd like to understand the structure a bit more.

  1. Can we exclude the data models from backup?
  2. What is that extra summary directory in _internaldb all about? Likewise, it can be excluded?
0 Karma
Highlighted

Re: data model storage and backups

SplunkTrust
SplunkTrust

I think the summary directory is related to report acceleration turned on for a search owned by nobody in the splunk_deployment_monitor app... I also think those two kinds of accerelations don't need to be backed up because they don't contain anything unique but rather only summaries of existing index data.

0 Karma
Highlighted

Re: data model storage and backups

Builder

The summary directory you see are for summary databases , this one seems to be generated by the deployment monitor app.

Your tsdix files should go in the data modelsummary dir if you do not tell them otherwise (in / via indexes.conf , look for tsidxhomepath or similar)

By default summary data should go to $SPLUNK_HOME/var/lib/splunk/database/summary

0 Karma
Highlighted

Re: data model storage and backups

Builder

indexes.conf - tstatsHomePath for datamodels
indexes.conf - tsidxStatsHomePath for accelerations
indexes.conf - summaryHomePath for summary data

0 Karma
Highlighted

Re: data model storage and backups

Builder
  1. as backup, the data should be generated when you first run /using the data models in the pivot if i don't remember wrong, so there should not be any point in making backups of them. If you create your own data models for your data, you should take a backup of the data model configuration.
0 Karma
Highlighted

Re: data model storage and backups

Builder

Exclude the datamodel_summary directories from backup.
If you restore an index, Splunk recreates the accelerated data model (that is what is stored in datamodel_summary) automatically.

0 Karma