Solved: Backing up Splunk Databases and files

jewettg · ‎03-04-2013

I am using Barracuda Yosmite Backup software to backup all my servers. Splunk on my Linux box is the only one giving me grief out of 40 others. I know a lot of the errors are changes between the scan and the actual backup process (temp files, etc.. that get created and deleted in seconds of each other).

Example output:

\Network\sigweb.signaturescience.com\File Systems\splunk_data\splunk\firewall\db\hot_v1_4691\rawdata
Error 3020: Object not found      Bkp: 35766044

\Network\sigweb.signaturescience.com\File Systems\splunk_data\splunk\firewall\db\hot_v1_4691
Error 3020: Object not found      Bkp: 1362042196-1362042194-7792846081280182826.tsidx
Error 3020: Object not found      Bkp: 1362042198-1362042194-6395177518873184810.tsidx
Error 3020: Object not found      Bkp: 1362045744-1362041885-2820434847812558834.tsidx

\Network\sigweb.signaturescience.com\File Systems\splunk_data\splunk\windows_data\db\hot_v1_618\rawdata
Error 3020: Object not found      Bkp: 482298675

I just want to ensure that all my log data that I have been collecting, settings, filters, and other core and customization of splunk apps are backed up. I do not need indexes, or stuff that can be regenerated if I had to rebuild the server.

So three questions:

Is there something special I can do in a pre-backup (to halt or pause, closing open files, prevent file creation, etc..) and post-backup (begin or resume operations)? Is this the best method to follow?
What directories should I ensure are being backed up so I can rebuild my Splunk server upon HW failure and have all my data intact, maybe not indexed, but can be indexed?
Is there a list of directories to just exclude from backup that are considered temporary or volatile scratch/index data?

Ayn · ‎03-04-2013

The docs on this subject are very helpful: http://docs.splunk.com/Documentation/Splunk/5.0.2/Indexer/Backupindexeddata

The docs page also references this blog post that is even more hands-on: http://blogs.splunk.com/2011/12/20/index-backup-strategy/

View solution in original post

noncon21 · ‎08-03-2016

I know this is a bit of a late response but my employer was also looking for a backup solution in the event something happened, after some research it dawned on me that using something like vss (volume shadow copy service) or veam backup and replication (our environment is virtualized) should do the trick. It might be something to considered.

saurabh_tek · ‎10-26-2016

Hey @noncon21 - could you suggest some softwares to use for this purpose in physical env. (from your knowledge)

Ayn · ‎03-04-2013

In Splunk terms, an index is what you would otherwise call a database. This is where all data goes.

jewettg · ‎03-04-2013

I just want the backups to work, backing up only what is collected, backing up what Splunk does with that data to make it searchable I assume can be "re-indexed". Maybe I am not speaking "Splunk", but in the past, indexes were temporary, can be rebuilt, such as cache data.

Ayn · ‎03-04-2013

The docs on this subject are very helpful: http://docs.splunk.com/Documentation/Splunk/5.0.2/Indexer/Backupindexeddata

The docs page also references this blog post that is even more hands-on: http://blogs.splunk.com/2011/12/20/index-backup-strategy/

jewettg · ‎03-04-2013

I do not want to reinvent the wheel. I already have a backup tool, just trying to get Splunk to slow down enough or take a coffee break for me to get a decent backup. I was really hoping this could be done easily.

robertlynch2020 · ‎06-22-2016

Did you get an answer to this , i have the same issue. Is there a command i can run before i run my tool that will back up the whole directory. Or do i have to Stop SPLUNK before i do a back up?

jewettg · ‎03-04-2013

So I have a pre-backup line that allows me to execute a single line before backups start. Usually if they are multi-line, you tell it to run as a script.

What would be the best line to tell Splunk to role all hot databases to warm and pause indexing and collecting data?
-- at which point I would backup the server.

Then I would use the "post-execution" line that allows me to resume indexing and collecting of data?

Are there two lines that could be executed on the command line that would do this?

Ayn · ‎03-04-2013

Yes, the hot buckets are volatile and need to be rolled to warm to be of any use when you restore your backup. So, either just ignore the hot buckets and accept that in doing so you will miss out on the data they hold, or make sure they're rolled to warm right before you do your backup. The docs and the blog post may speak about this in 'manual' terms, but you could script this just as well.

jewettg · ‎03-04-2013

All these links lead to manually backing up Splunk using what eludes to scheduled or manually invoked batch scripts or commands to backup the data.

I already have a tool that will backup the entire system (weekly full) and then do incremental throughout the weekdays.

I would like to use this tool (Yosmite: http://www.barracudaware.com/products/server-backup), exclude I am guessing the "hot" databases, as those are in constant flux - right?

Ayn · ‎03-04-2013

"I do not need indexes" <-- ???

The indexes carry all your log data, so I'm not sure you really mean this...

Backing up Splunk Databases and files

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

SignalFlow: What? Why? How?