We’re creating some backup procedures and had some questions about how Splunk deals with Hot index files.
I read in the documentation that HOT index files should not be ‘backed up’ and only warm indexes should be accessed. Does this hold true with block-level replication? (such as DRBD?)
If my primary Splunk server crashes and I need to start up Splunk on another machine (assuming all log data also resides here) how do I rebuild my HOT index files? Can I tell Splunk where to begin indexing?
DRBD is sane, but of course not a backup. If the splunk system hardware has a memory failure and scribbles nonsense into the database, DRBD will faithfully mirror the nonsense.
The problem with backups is they work file-at-a-time so they do not get a time-coherent dataset. This limitation does not apply to DRBD, which operates at the block layer.
DRBD is a good tool, and there is no reason splunk cannot work with it (some customers have done so). The main caveat is that there are some scenarios where splunk may not be happy with the dataset it is given on startup, and may not automatically recover from the provided data. This is true with most systems, but historically Splunk has had trouble more often than a less data-intensive tool like, say, apache. However, significant improvments have been made in data error recovery and avoidance.
If DRBD seems appropriate for you, then by all means use it, but it certainly raises the administration effort involved. I would definitely recommend evaluating combining this with a backup strategy (as above). On Linux, I would evalute LVM2 for snapshotting-to-backup goals.
And regarding Lowell: Sure, btrfs, if it delivers, will be convenient. People can already get this functionality on ZFS on freebsd, solaris, but ZFS comes with costs -- a high cpu overhead for bulk transfer operations. We can only hope btrfs will deliver on all fronts.
At that point you'd have to talk to an HA guru. I can't intelligently comment on this idea of splitting DRBD temporarily.
One thing to realize is that Splunk rewrites its data several times in the hot buckets (this is why backups tend to be a bit nonsense), which will mean that the DRBD write rate will be a few times higher than the overall indexing rate. It's probably doable regardless.
Oh I see. Well, if you create a full mirror with DRBD, if it is possible to temporarily split the mirror and back up the inactive copy. (Then you would have to re-synchronize the mirror when done, which might the hard/time-consuming part.) This is possible with some storage systems. I don't know how quick or easy re-syncing a broken mirror is with DRBD.
I'm looking forward to when
btrfs on Linux is considered production-ready. Filesystem-level snapshotting will be a very nice improvement. 😉 Unfortunately, this is still probably a few years out for most of us.
You can back up hot indexes, but not while they are being actively written as the backup will be inconsistent. Of course, you can back them up while Splunk is not running. Otherwise, you can back them up if you can take a filesystem snapshot of the hot volume, as is possible with Microsoft NTFS VSS/Shadow Copy or with Solaris ZFS snapshots. I do not know if DRBD provides this capability.
DRDB is block level mirroring to another box - it has no concept of filesystem structure. I think for this to be effective, you'd still need to use some kind of filesystem-level snapshot on top of DRDB. http://www.drbd.org/home/mirroring/ suggests using DRDB underneath LVM to get snapshot capability along with mirroring.