Question from my backup guys and I couldn't find a good answer in the docs- I don't understand the structure of the data model data on the system. Indexes with a data model defined have a datamodel_summary directory:
[splunk@splunk3 splunk]$ ll ./firewall
total 60
drwx------. 37 splunk splunk 4096 May 2 13:26 colddb
drwx------. 340 splunk splunk 24576 May 2 13:35 datamodel_summary
drwx------. 306 splunk splunk 20480 May 3 10:06 db
drwx------. 2 splunk splunk 4096 Aug 17 2013 thaweddb
In the _internaldb index directory, I seem to have one of these and another "summary" directory that looks like it's associated somehow with the splunk deployment monitor:
[splunk@splunk3 splunk]$ ll _internaldb/
total 532
drwx------. 2216 splunk splunk 126976 May 3 09:47 colddb
drwx------. 2519 splunk splunk 184320 May 3 09:55 datamodel_summary
drwx------. 306 splunk splunk 28672 May 3 10:08 db
drwx------. 2519 splunk splunk 184320 May 3 09:50 summary
drwx------. 2 splunk splunk 4096 Aug 16 2013 thaweddb
[splunk@splunk3 splunk]$ ll _internaldb/summary/998_163BFC27-2C4C-4CDE-83CD-F8B48C29BA80/20D17CF6-2E61-47A1-B3A4-FF57509916DF/
total 596
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_1a56f43bf8d5bf20
drwx------. 2 splunk splunk 32768 Dec 23 05:10 splunk_deployment_monitor_nobody_26e747c470c62ba8
<snip several lines />
drwx------. 2 splunk splunk 24576 Jan 11 14:08 splunk_deployment_monitor_nobody_NSd0dc3ea132443bbf
From the backup perspective, the backups are throwing a thousands of errors each night for non-existant files (were there when the drive was scanned, but not when it came time to back up). I'm fairly sure it's okay to tell them to exclude the datamodel_summary (and summary) directories entirely since they can be recreated after a restore, but for my own sanity I'd like to understand the structure a bit more.
Exclude the datamodel_summary
directories from backup.
If you restore an index, Splunk recreates the accelerated data model (that is what is stored in datamodel_summary
) automatically.
The summary directory you see are for summary databases , this one seems to be generated by the deployment monitor app.
Your tsdix files should go in the data model_summary dir if you do not tell them otherwise (in / via indexes.conf , look for tsidx_homepath or similar)
By default summary data should go to $SPLUNK_HOME/var/lib/splunk/database/summary
indexes.conf - tstatsHomePath for datamodels
indexes.conf - tsidxStatsHomePath for accelerations
indexes.conf - summaryHomePath for summary data
I think the summary directory is related to report acceleration turned on for a search owned by nobody in the splunk_deployment_monitor
app... I also think those two kinds of accerelations don't need to be backed up because they don't contain anything unique but rather only summaries of existing index data.