Reporting

Better Hot-To-Warm Roll Methods?

tgiles
Path Finder

Hi,

I'm trying to pin down a method to quickly export an index from a Splunk indexer that I can copy off to another Splunk instance on a different system.

From what I have seen thus far, that would entail me performing a roll-to-warm, stop Splunk on indexer, copy the db files, and start up Splunk on the indexer once again. Wondering if there is a better method.

  • splunk _internal call /data/indexes/$indexName/roll-hot-buckets (this will roll the hot bucket to warm for backup)
  • splunk stop splunkd (will stop splunk, to keep the index from getting written to)
  • ...copy / zip files as needed here...
  • splunk start splunkd (restarts splunk again, enabling indexing for the target index we're working on)

Thanks for any input!

Tags (2)
1 Solution

Lowell
Super Champion

You may find some information from this question helpful:

http://answers.splunk.com/questions/3078/copy-an-index-on-the-same-splunk-instance

Assuming your not on Windows, you can copy your data while splunkd is running, but your results may not be fully consistent. So some kind of files system or block-level snapshotting is ideal here to get a more consistent result. (For example, using a LVM snapshot). In this case you shouldn't need to bring splunkd down at all, or roll your buckets from hot to warm. Of course, it all depends on what kind of event loss tolerance you can handle. And if you're looking to do a one-time copy of a bucket, or use something like rsync on a ongoing basis (see the above link.)

Stopping and restarting splunkd is certainly going to interrupt any running searches and put a temporary delay on any indexing. And if your actually bringing down splunkd then even your "hot" buckets will be consistent while the splunkd is not running. Also keep in mind that forcing a bucket roll, will NOT guarantee that all of your buckets are WARM, because splunk will immediately create new hot buckets for any events that are received between the time your script forceably rolls your buckets and the time splunkd is shut down.

Again, the more details you can provide the more helpful the people here can be.

View solution in original post

0 Karma

Lowell
Super Champion

You may find some information from this question helpful:

http://answers.splunk.com/questions/3078/copy-an-index-on-the-same-splunk-instance

Assuming your not on Windows, you can copy your data while splunkd is running, but your results may not be fully consistent. So some kind of files system or block-level snapshotting is ideal here to get a more consistent result. (For example, using a LVM snapshot). In this case you shouldn't need to bring splunkd down at all, or roll your buckets from hot to warm. Of course, it all depends on what kind of event loss tolerance you can handle. And if you're looking to do a one-time copy of a bucket, or use something like rsync on a ongoing basis (see the above link.)

Stopping and restarting splunkd is certainly going to interrupt any running searches and put a temporary delay on any indexing. And if your actually bringing down splunkd then even your "hot" buckets will be consistent while the splunkd is not running. Also keep in mind that forcing a bucket roll, will NOT guarantee that all of your buckets are WARM, because splunk will immediately create new hot buckets for any events that are received between the time your script forceably rolls your buckets and the time splunkd is shut down.

Again, the more details you can provide the more helpful the people here can be.

View solution in original post

0 Karma

ephemeric
Contributor

@Lowell: thank you, this was very helpful, I'm researching something similar.

0 Karma

tgiles
Path Finder

Thanks for your input, Lowell. I'm still doing a lost of investigation with a test setup, so a number of items on my end are still in flux.

You gave me a solid alternative method that I will perform some testing with. Thanks for your time!

0 Karma

Lowell
Super Champion

Can you provide a high-level overview of what you are trying to accomplish? Also if you can provide some reason(s) why you can't simply use splunk event forwarding which traditionally the suggested way of distributing events across splunk instances.

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!