So this question is old, Lucas not sure if you came up with a solution or not, but here's some additional info:
First off, you don't want to simply replicate existing buckets as is. Duplicating your buckets by hand will result in Splunk seeing the data twice. Splunk has no way of knowing that the same bucket exists two places, so it will treat both (or all) copies as pre-clustered buckets and therefore search all copies. So not only do you have the overhead of searching the same data multiple times, but now you'll need some sort of "dedup" or other clever way to eliminate duplicates in your searches. Not fun.
It's pretty easy to trick Splunk into converting non-clustered buckets into clustered ones. If you've taken a look at the bucket folder naming, you'll pretty quickly see the difference between the names of clustered and non-clustered buckets. The biggest difference is that the clustered buckets has the GUID as part of the name, which indicates which server the bucket originated. Keep in mind that the cluster master is essentially stateless between restarts, so everything it knows about the cluster is gleaned during the initialization phase; this mean that you can pretty easily trick Splunk into thinking that a non-clustered bucket is a single-site clustered bucket. (Multi-site clustering is a completely different beast in this regard. Splunk made it much more difficult to pull of this kind of a trick at a multi-site level.)
Bucket conversion itself can be done using something like this:
GUID=$(cat $SPLUNK_HOME/etc/instance.cfg | grep '^guid' | tr -d ' '| cut -d'=' -f 2)
find $SPLUNK_DB -type d -regex '.*/db_[0-9]+_[0-9]+_[0-9]+' | ( while read bkt; do mv -v $bkt ${bkt}_${GUID} ; done; )
Use this at your own risk! Only run this on the indexes you want to replicate. Copy a small number of buckets over to a test server and do this all in a non-production cluster before you attempt this on prod... (And so on) Also keep in mind that there are additional conversation steps required beyond this, but this is the one bit in particular that's not really documented.
Once your buckets are renamed, I suppose your could pre-replicate them out to a secondary node. However the bucket should be renamed from "db_" to "rb_" (indicating that it's a replicated bucket). Depending which version of Splunk you're running (5.x vs 6.x) where the replicated bucket should end up will be different. And of course if you already have multiple indexers, trying to pre-share this data gets a lot more complicated and probably isn't worth it. (In Splunk 6 the bucket replication is actually more efficient because it tries to copy both the raw and index data at once, whereas in 5.x Splunk would only copy the raw data and the require the indexes to be rebuild on the destination server; which consumed considerably more resources.)
And again, if any of this seems difficult or confusing to you, and you value your data, please contact an expert. There's Splunk PS, and lots of Splunk partners who are qualified to help with this kind of conversion. There's also a number of pros/cons to consider with clustering in general, which is good to talk though with someone who's had experience maintaining a cluster.
For full disclosure, I work for Splunk partner.
... View more