I'm running two Windows Splunk servers (combo search heads and indexers, v6.0.1). One is dedicated to our non-production, development environment; it's intended to be used for developing new reports without interfering in any way, shape, or form with the live production environment.
The biggest problem we have is that the data in our development environment is stale, to put it mildly. This makes it extremely difficult to develop new reports because we never have decently fresh data to work with.
The second problem is that on those rare occasions when we refresh the data in our development environment using production data, we're going to end up consuming an undesirable amount of our license bandwidth when the development Splunk server indexes the refreshed data.
We're able to easily copy the bucket folders between the servers, but our searches (which rely heavily on fields that were extracted at index time) break because the extracted fields "don't exist". I'm guessing there's a metadata file or something somewhere that should be copied along with the buckets, but I haven't been able to find one.
Is there a way to copy the data under $SPLUNK_HOME$\var\lib\splunk\index_name\db from the production server to the development server, with the field extractions intact?
I thought perhaps this was related to the sourcetypes (since in production, we use a sourcetype of "application_www_datatype" and in development, we use "application_dev_datatype"), but I created the "www" sourcetype on the development server and that didn't solve the field extractions issue.
Have you tried to use the event generator (https://github.com/splunk/eventgen) to generate data based on samples from production?
This is not a direct answer to your question, but could help you achieve the goal of having relatively current data in your test environment.
You could set it up in test and have it constantly generate matching fresh data. This would avoid the need to copy / move buckets.
Depending on your use case it might work.
I've investigated making a clustered environment, which would allow the live data to mirror to our development environment, but that introduces new issues around app development and deployment and also makes it difficult to test new data sources without affecting the production environment.
Any suggestions out there?