We are planning out the infrastructure for Splunk at my company and I have a quick question if something can be done. We are trying to bring up a dev, stage, and production environment.
Sorry for the long post, just trying to explain our vision currently - and also taking recommendations on what you guys have done with infrastructure and prevent double indexing as best as possible.
If the staging environment is just for validation of searches/alerts/dashboards before rolling them out in production, you can just have dedicated search head, accessing production Indexers, to be used as staging environment. Please note that since it's using production indexers, there will performance impact of usage on staging environment (search head).
If you intend dev to be identical to prod in terms of Splunk configuration, I'd recommend a (small) distributed setup there as well. This way you will get the split between index-time and search-time configuration right before deploying them into prod.
dev being a single server for development only purposes using log samples and made up data for testing logic on dashboards.
For what you describe, you would want a stand alone instance of Splunk (search head and indexer in one) - where you load in your sample data... (grab a week, a month, a year of logs and point the instance at it) or test other kinds of inputs... that instance can also have a forwarder pointed at it (for testing) and act as a deployment server. When you test each project, you'll want to create a barebones app, and be sure to adjust permissions to App Only. This way, you can create your data onboarding, or index time stuff as well as your search time stuff (or knowledge objects) and the configuration files will sit all tidy in the $SPLUNK_HOME/etc/apps/yourtestapp/ directories, making it easy to push to production when you're ready (or package to ship to indexer/search head/forwarder as required.
This environment won't touch the production indexers because the dev process tends to start with a lot of wildcard and "all time" or "lots of time" sloppiness and you don't want any of that to effect performance on production indexers.
Something like a dashboard, would then be staged against production indexers as follows.
staging environment is where the question is. Would it be possible to have a staging search head only that is tied to the produciton indexers to feed staging data for accurate testing?
What you might want to do here, is as you said... a separate search head that is directed to use distributed search of your production data. Staging would be for "search head stuff" since anything going on the indexer would be tested in dev and then involve index destinations, line breaking etc. if you've tested things like data onboarding in dev on "staged" versions of real data... you can feel comfortable applying them to prod. Stage set up with a stand alone search head will enable you to test your searches, dashboards, knowledge objects etc... without stealing cores from your production system. It should all be optimized (in dev) before moving to stage so you don't have to be concerned about resource hits on the production data.
because the stage search head isn't indexing... you can safely license that as a slave.
In the case of Dev, you are going to eat up some index license. If you are concerned about lots of indexing (I don't know what your license capacity will be or what your dev plans involve) you can slice off a bit of license in to a separate license pool. If the Dev box causes violations it won't effect the main pool...
Both of the comments from somesoni2 and martin_mueller add additional tidbits as they have also had this discussion with folks on the subject lots of times. I just wanted to kind of spell out where the performance hits and resource stuff was in relation to what you've proposed and give some context. The only thing I don't completely agree with is the need for a distributed environment for Dev or any kind of mirroring of your environment. You'll never fully be able to test an identical environment (thruput... indexers at full blast etc... ) so it tends to be a bit of a moneypit rather than something truly useful. Splunk development is really all about the pieces of your apps and whether you have runaway regexes and having something that you can restart over and over again as you load data, realize you've missed the boat... and stop/clean index /start again...
Just my two cents.
Thank you all for the great information and feed back on the question I've had. I guess my only next question is - is this a good approach for what we are trying to do or has other people done it a different, more efficient way?
I've had customers who have done full blown duplicate environments (unnecessary in my opinion) and I have a couple of customers, who allow their admins to work from their laptops (local Splunk, pointing to prod indexers) and while I question the wild west approach of something like that... it's their preference. Most important is that you help people understand "how" to work and take best practice approaches of developers... have good naming conventions, separate the work out on dev with apps (folders) and use something for version control... (git hub repository is a great tool for that if Splunk sits in a place where people don't usually do that stuff). You've got a very good plan. Don't worry so much about it... just implement lightly and you'll figure out if it works. Splunk is a living, breathing thing. Evangelize best practices to your dev people and they won't need much.
A little background into why I'd consider a small distributed setup even for dev/staging: When I'm onboarding a new "thing" into Splunk, there's the whole range to consider: Indexes, roles for the "thing" searching those indexes, index-time props.conf/transforms.conf, search-time props.conf/transforms.conf, saved searches, dashboards, etc.
If I put all that into a single app on a standalone dev/staging Splunk I'll then have to go in and cut up my app into indexer and search head apps - no fun to do, and quite error-prone. Instead I start out with the split already built into the dev/staging process, making transporting the apps (shameless plug: https://splunkbase.splunk.com/app/2613/ :p) very simple.
Additionally, some things behave differently. For example, if some saved searches or dashboards contain rest calls you will have to think about who to query in a distributed environment - if you develop that on an all-in-one splunk you won't have prepared the correct query for your prod environment.
I fully agree that it's adding some cost to your Splunk operation... in my experience it's worth it, your mileage may vary.
in a pinch... I think it could be simulated by cutting up the app into different apps, kind of in TA/DA/SA style... there is certainly nothing wrong with a fully functional exact duplicate... it just depends also on the sophistication of what you're developing, especially if there are roles in play etc. BTW I LOVE App Exporter. 🙂
The trouble with cutting up apps while on a single instance is that even if you make a mistake and misplace a config it'll still work... until you transport to prod, then it goes boom.
I'd cut corners with the exact duplicate. If you have, say, ten indexers in prod you certainly don't need ten in dev/staging. Performance-related issues won't come to light anyway, so all you're making is hot air and profit for your data center.
However, if you have an indexer or search head cluster it might be a good idea to have a staging indexer or search head cluster as well... if only to get your deployment process straightened out before going to prod.
true, true +1