Deployment Architecture

Backing up Splunk Databases and files

jewettg
Explorer

I am using Barracuda Yosmite Backup software to backup all my servers. Splunk on my Linux box is the only one giving me grief out of 40 others. I know a lot of the errors are changes between the scan and the actual backup process (temp files, etc.. that get created and deleted in seconds of each other).

Example output:

\Network\sigweb.signaturescience.com\File Systems\splunk_data\splunk\firewall\db\hot_v1_4691\rawdata
Error 3020: Object not found      Bkp: 35766044

\Network\sigweb.signaturescience.com\File Systems\splunk_data\splunk\firewall\db\hot_v1_4691
Error 3020: Object not found      Bkp: 1362042196-1362042194-7792846081280182826.tsidx
Error 3020: Object not found      Bkp: 1362042198-1362042194-6395177518873184810.tsidx
Error 3020: Object not found      Bkp: 1362045744-1362041885-2820434847812558834.tsidx

\Network\sigweb.signaturescience.com\File Systems\splunk_data\splunk\windows_data\db\hot_v1_618\rawdata
Error 3020: Object not found      Bkp: 482298675

I just want to ensure that all my log data that I have been collecting, settings, filters, and other core and customization of splunk apps are backed up. I do not need indexes, or stuff that can be regenerated if I had to rebuild the server.

So three questions:

  1. Is there something special I can do in a pre-backup (to halt or pause, closing open files, prevent file creation, etc..) and post-backup (begin or resume operations)? Is this the best method to follow?
  2. What directories should I ensure are being backed up so I can rebuild my Splunk server upon HW failure and have all my data intact, maybe not indexed, but can be indexed?
  3. Is there a list of directories to just exclude from backup that are considered temporary or volatile scratch/index data?
1 Solution

Ayn
Legend

The docs on this subject are very helpful: http://docs.splunk.com/Documentation/Splunk/5.0.2/Indexer/Backupindexeddata

The docs page also references this blog post that is even more hands-on: http://blogs.splunk.com/2011/12/20/index-backup-strategy/

View solution in original post

noncon21
Engager

I know this is a bit of a late response but my employer was also looking for a backup solution in the event something happened, after some research it dawned on me that using something like vss (volume shadow copy service) or veam backup and replication (our environment is virtualized) should do the trick. It might be something to considered.

saurabh_tek
Communicator

Hey @noncon21 - could you suggest some softwares to use for this purpose in physical env. (from your knowledge)

0 Karma

Ayn
Legend

In Splunk terms, an index is what you would otherwise call a database. This is where all data goes.

jewettg
Explorer

I just want the backups to work, backing up only what is collected, backing up what Splunk does with that data to make it searchable I assume can be "re-indexed". Maybe I am not speaking "Splunk", but in the past, indexes were temporary, can be rebuilt, such as cache data.

0 Karma

Ayn
Legend

The docs on this subject are very helpful: http://docs.splunk.com/Documentation/Splunk/5.0.2/Indexer/Backupindexeddata

The docs page also references this blog post that is even more hands-on: http://blogs.splunk.com/2011/12/20/index-backup-strategy/

jewettg
Explorer

I do not want to reinvent the wheel. I already have a backup tool, just trying to get Splunk to slow down enough or take a coffee break for me to get a decent backup. I was really hoping this could be done easily.

0 Karma

robertlynch2020
Influencer

Did you get an answer to this , i have the same issue. Is there a command i can run before i run my tool that will back up the whole directory. Or do i have to Stop SPLUNK before i do a back up?

0 Karma

jewettg
Explorer

So I have a pre-backup line that allows me to execute a single line before backups start. Usually if they are multi-line, you tell it to run as a script.

What would be the best line to tell Splunk to role all hot databases to warm and pause indexing and collecting data?
-- at which point I would backup the server.

Then I would use the "post-execution" line that allows me to resume indexing and collecting of data?

Are there two lines that could be executed on the command line that would do this?

0 Karma

Ayn
Legend

Yes, the hot buckets are volatile and need to be rolled to warm to be of any use when you restore your backup. So, either just ignore the hot buckets and accept that in doing so you will miss out on the data they hold, or make sure they're rolled to warm right before you do your backup. The docs and the blog post may speak about this in 'manual' terms, but you could script this just as well.

jewettg
Explorer

All these links lead to manually backing up Splunk using what eludes to scheduled or manually invoked batch scripts or commands to backup the data.

I already have a tool that will backup the entire system (weekly full) and then do incremental throughout the weekdays.

I would like to use this tool (Yosmite: http://www.barracudaware.com/products/server-backup), exclude I am guessing the "hot" databases, as those are in constant flux - right?

0 Karma

Ayn
Legend

"I do not need indexes" <-- ???

The indexes carry all your log data, so I'm not sure you really mean this...

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...