Are there any good solutions for copying the contents of an index to a different index name on the same Splunk instance (e.g. cp index1 index2) so that quiries can be made against either and get the same result? Use Case: Site-specific test of performance across different storage mediums (RAID5 vs RAID1+0). The goal is to measure the performance of the same queries across the same data with the same results and to understand what that will cost them.
Note: Re-ingest of the original data to the test indexes is possible but will cost time/resources and create license violations. A more efficient way would be ideal.
You can simply copy entire buckets between indexes for the purpose of a test like this.
You could probably get away with using something like rsync
.
rsync -a --delete $SPLUNK_HOME/var/lib/splunk/index1 $SPLUNK_HOME/var/lib/splunk/index2
Be sure that index2
doesn't already exist first.
You need the --delete
if you are going to keep things in sync (by running this periodically) since Splunk frequently creates and removes files during the indexing process.
All of your warm buckets should really only be copied once, but the "hot" buckets have lots of changes going on. If you don't need your data to be very up to date, then you could simply copy your only the warm buckets (the ones named like db_*_*_*
).
I think as long as you're only feeding events into index1
and NEVER into index2
, you should be ok. I probably wouldn't recommend this as a long-term approach, but for some testing this should be an adequate solution.
I think this is obvious, but if running rsync
during a lower-volume time would be more ideal.
Yeah. You can just stop Splunk and copy the entire index folders (hot/warm and cold, or just hot/warm if that's all you have) to the new folder paths, then configure the new path in Splunk indexes.conf
as a new index, then restart Splunk.
Thanks Gerald!!
You can simply copy entire buckets between indexes for the purpose of a test like this.
You could probably get away with using something like rsync
.
rsync -a --delete $SPLUNK_HOME/var/lib/splunk/index1 $SPLUNK_HOME/var/lib/splunk/index2
Be sure that index2
doesn't already exist first.
You need the --delete
if you are going to keep things in sync (by running this periodically) since Splunk frequently creates and removes files during the indexing process.
All of your warm buckets should really only be copied once, but the "hot" buckets have lots of changes going on. If you don't need your data to be very up to date, then you could simply copy your only the warm buckets (the ones named like db_*_*_*
).
I think as long as you're only feeding events into index1
and NEVER into index2
, you should be ok. I probably wouldn't recommend this as a long-term approach, but for some testing this should be an adequate solution.
I think this is obvious, but if running rsync
during a lower-volume time would be more ideal.
You know, I thought about actually mentioning that but decided not to. (I did double check to make sure my example was right, which I think it is). Of course the nice thing is you can often just move the path manually and then re-run rsync. BTW, I always put my rsync commands inside a script for that exact reason. It's just way too easy to screw up; especially when your tacking --delete
on there.
you can always use -n as a dry-run to see what rsync will do before running it for real
Um, right. Hopefully this has something to do with the fact that I have been on travel for a month straight and has nothing to do with my splunktelligence.
Actually a botched rsync (bad trailing slash) put the source as a dir in the dest instead in the dest proper and somehow I missed my mistake. Ugh. Thanks Lowell!!