Deployment Architecture
Highlighted

Copy clustered buckets to a non-clustered environment

Path Finder

Hi all,

I have a cluster deployment with 2 peer nodes (replication and search factor = 2).

For backup and offline consultation purposes, I would like to copy db_* and rb_* buckets from one of 2 peer nodes to a separate instance of splunk (non-clustered).

It's possible?

There are correction operations to do on the buckets? For example, remove the _guid...

Thanks!
Cristian

Tags (2)
0 Karma
Highlighted

Re: Copy clustered buckets to a non-clustered environment

Splunk Employee
Splunk Employee

Uh, what are "offline consultation" purposes?

Anyway, I haven't tried this, but I suspect that you're right: removing the guid should be enough. Note that you probably never want to worry about the rb as those are the replica buckets. Copying those and the original db_ would mean that you'd have three copies of every bucket (the 2 from the clustered set, and the copy itself). That's probably overkill.

I'll note that non-replicated buckets in a clustered environment (where the index's repFactor is 0) do not show the GUID, so I suspect it's mostly a straightforward conversion. Just be careful about the bucket IDs as well, since each indexer's "main" index (for example) will have bucket 0, bucket 1, etc.

0 Karma
Highlighted

Re: Copy clustered buckets to a non-clustered environment

Path Finder

Yes... "offline consultation" is not the correct term 🙂

Among all indexes that are managed by the cluster environment, I have to copy only three of these on the single non-cluster splunk environment (via rsync, scp, ...)

The "non-cluster" node is placed in a separate location and will be used by law enforcement.

So, I need a good tip to copy the critical indexes from the cluster environment to the single remote node... 😉

0 Karma
Highlighted

Re: Copy clustered buckets to a non-clustered environment

Splunk Employee
Splunk Employee

I would try this: Sync the db* to your external location, with one per each indexer, per index. That is, if your index is main, and your indexers are idx1 and idx2, create two new indexes on the target: main_idx1 and main_idx2, and sync the respective db's into there. Then, when you need to search those, do index=main*, to retrieve results from each, and not have to seam bucket IDs. Further, the index metadata field in the results will tell you which indexer indexed the data originally, if that matters.

Highlighted

Re: Copy clustered buckets to a non-clustered environment

Splunk Employee
Splunk Employee

Also, note that syncing hots is not advised, blah blah, files are open for writing, and the directory name will change when it rolls from hot to warm.

0 Karma