Hello all,
We are planning out the infrastructure for Splunk at my company and I have a quick question if something can be done. We are trying to bring up a dev, stage, and production environment.
Sorry for the long post, just trying to explain our vision currently - and also taking recommendations on what you guys have done with infrastructure and prevent double indexing as best as possible.
dev being a single server for development only purposes using log samples and made up data for testing logic on dashboards.
For what you describe, you would want a stand alone instance of Splunk (search head and indexer in one) - where you load in your sample data... (grab a week, a month, a year of logs and point the instance at it) or test other kinds of inputs... that instance can also have a forwarder pointed at it (for testing) and act as a deployment server. When you test each project, you'll want to create a barebones app, and be sure to adjust permissions to App Only. This way, you can create your data onboarding, or index time stuff as well as your search time stuff (or knowledge objects) and the configuration files will sit all tidy in the $SPLUNK_HOME/etc/apps/yourtestapp/ directories, making it easy to push to production when you're ready (or package to ship to indexer/search head/forwarder as required.
This environment won't touch the production indexers because the dev process tends to start with a lot of wildcard and "all time" or "lots of time" sloppiness and you don't want any of that to effect performance on production indexers.
Something like a dashboard, would then be staged against production indexers as follows.
staging environment is where the question is. Would it be possible to have a staging search head only that is tied to the produciton indexers to feed staging data for accurate testing?
What you might want to do here, is as you said... a separate search head that is directed to use distributed search of your production data. Staging would be for "search head stuff" since anything going on the indexer would be tested in dev and then involve index destinations, line breaking etc. if you've tested things like data onboarding in dev on "staged" versions of real data... you can feel comfortable applying them to prod. Stage set up with a stand alone search head will enable you to test your searches, dashboards, knowledge objects etc... without stealing cores from your production system. It should all be optimized (in dev) before moving to stage so you don't have to be concerned about resource hits on the production data.
because the stage search head isn't indexing... you can safely license that as a slave.
In the case of Dev, you are going to eat up some index license. If you are concerned about lots of indexing (I don't know what your license capacity will be or what your dev plans involve) you can slice off a bit of license in to a separate license pool. If the Dev box causes violations it won't effect the main pool...
Both of the comments from somesoni2 and martin_mueller add additional tidbits as they have also had this discussion with folks on the subject lots of times. I just wanted to kind of spell out where the performance hits and resource stuff was in relation to what you've proposed and give some context. The only thing I don't completely agree with is the need for a distributed environment for Dev or any kind of mirroring of your environment. You'll never fully be able to test an identical environment (thruput... indexers at full blast etc... ) so it tends to be a bit of a moneypit rather than something truly useful. Splunk development is really all about the pieces of your apps and whether you have runaway regexes and having something that you can restart over and over again as you load data, realize you've missed the boat... and stop/clean index /start again...
Just my two cents.
I've configured a similar environment similar to what you've asked for.
We have a testing/dev indexing environment. This is where we fix parsing etc.
We have a testing/dev search environment. Creation of dashboards, field extractions etc. These can search both the dev indexing environment AND production data using a special configuration Testers can see what their dashboards will look like against production data.
Production only search environment. These search heads only look at production data so not to confuse end users.
We use all the same index names in both environments as having the ability to see both at the same time would cause issues. ie. potentially duplicate data, and/or incorrectly parsed data.
These is a simple way around this however.
How I did it.
It was all done using a very simple option called "srchFilter" in authorize.conf.
In a search head cluster we define members that will search our development indexers or production indexers. On each one of these we define a srchFilter such as "srchFilter= splunk_server=development-peer*" or "srchFilter= splunk_server=production-peer*". This filter is prepended to every search from this search head. By having a different filter per search head you can limit the data each one can see.
We then use F5 vip addresses to redirect users to either a "development only" or a "production only" search head based on url. As they are all in the same cluster that will have the same dashboards field extractions etc so a developer can quickly see the effect of any changes on production data prior to users seeing this on production search heads.
Once they have confirmed that it looks good on the production only search head they can deploy their specific configurations into the production environment for end users to see.
Are you using only 2 search heads in the testing/dev cluster ? without the load balancer ?
Can't use 2 search heads in a search head cluster environment.
Tip: you can have multiple instances on a single machine to bring the numbers up 🙂
Lucas, the above example you mentioned of having 2 nodes in cluster in your dev environment. how was it done with 2 nodes ?
I don't think I said anywhere the specific number of nodes I had in my clusters. Can you show/quote me on what I said as I can't find it.
Ok i got mixed up here. So how was your setup, how many nodes did you have.
if i have 4 nodes, can i point 3 to dev indexer and 1 to prod indexer while in cluster.
plus , we need a load balancer.
im looking to have a good dev setup in my environment which can server the purpose of testing with test data and developing reports + testing with prod data.
I'm trying to implement the same thing for a client. They have a huge production environment several heavy forwarders multiple peer indexers but it's not in a cluster setup. Did I mention this is production. They setup the same index names in a distributed search environment
They want a new testing.dev indexing environment and possibly a way to point some of the qa/dev/staging dashboards to this environment. Is this possible
You can still do it without using clustering (it is actually easier).
Thank you guys for such great input!
dev being a single server for development only purposes using log samples and made up data for testing logic on dashboards.
For what you describe, you would want a stand alone instance of Splunk (search head and indexer in one) - where you load in your sample data... (grab a week, a month, a year of logs and point the instance at it) or test other kinds of inputs... that instance can also have a forwarder pointed at it (for testing) and act as a deployment server. When you test each project, you'll want to create a barebones app, and be sure to adjust permissions to App Only. This way, you can create your data onboarding, or index time stuff as well as your search time stuff (or knowledge objects) and the configuration files will sit all tidy in the $SPLUNK_HOME/etc/apps/yourtestapp/ directories, making it easy to push to production when you're ready (or package to ship to indexer/search head/forwarder as required.
This environment won't touch the production indexers because the dev process tends to start with a lot of wildcard and "all time" or "lots of time" sloppiness and you don't want any of that to effect performance on production indexers.
Something like a dashboard, would then be staged against production indexers as follows.
staging environment is where the question is. Would it be possible to have a staging search head only that is tied to the produciton indexers to feed staging data for accurate testing?
What you might want to do here, is as you said... a separate search head that is directed to use distributed search of your production data. Staging would be for "search head stuff" since anything going on the indexer would be tested in dev and then involve index destinations, line breaking etc. if you've tested things like data onboarding in dev on "staged" versions of real data... you can feel comfortable applying them to prod. Stage set up with a stand alone search head will enable you to test your searches, dashboards, knowledge objects etc... without stealing cores from your production system. It should all be optimized (in dev) before moving to stage so you don't have to be concerned about resource hits on the production data.
because the stage search head isn't indexing... you can safely license that as a slave.
In the case of Dev, you are going to eat up some index license. If you are concerned about lots of indexing (I don't know what your license capacity will be or what your dev plans involve) you can slice off a bit of license in to a separate license pool. If the Dev box causes violations it won't effect the main pool...
Both of the comments from somesoni2 and martin_mueller add additional tidbits as they have also had this discussion with folks on the subject lots of times. I just wanted to kind of spell out where the performance hits and resource stuff was in relation to what you've proposed and give some context. The only thing I don't completely agree with is the need for a distributed environment for Dev or any kind of mirroring of your environment. You'll never fully be able to test an identical environment (thruput... indexers at full blast etc... ) so it tends to be a bit of a moneypit rather than something truly useful. Splunk development is really all about the pieces of your apps and whether you have runaway regexes and having something that you can restart over and over again as you load data, realize you've missed the boat... and stop/clean index /start again...
Just my two cents.
A little background into why I'd consider a small distributed setup even for dev/staging: When I'm onboarding a new "thing" into Splunk, there's the whole range to consider: Indexes, roles for the "thing" searching those indexes, index-time props.conf/transforms.conf, search-time props.conf/transforms.conf, saved searches, dashboards, etc.
If I put all that into a single app on a standalone dev/staging Splunk I'll then have to go in and cut up my app into indexer and search head apps - no fun to do, and quite error-prone. Instead I start out with the split already built into the dev/staging process, making transporting the apps (shameless plug: https://splunkbase.splunk.com/app/2613/ :p) very simple.
Additionally, some things behave differently. For example, if some saved searches or dashboards contain rest calls you will have to think about who to query in a distributed environment - if you develop that on an all-in-one splunk you won't have prepared the correct query for your prod environment.
I fully agree that it's adding some cost to your Splunk operation... in my experience it's worth it, your mileage may vary.
in a pinch... I think it could be simulated by cutting up the app into different apps, kind of in TA/DA/SA style... there is certainly nothing wrong with a fully functional exact duplicate... it just depends also on the sophistication of what you're developing, especially if there are roles in play etc. BTW I LOVE App Exporter. 🙂
The trouble with cutting up apps while on a single instance is that even if you make a mistake and misplace a config it'll still work... until you transport to prod, then it goes boom.
I'd cut corners with the exact duplicate. If you have, say, ten indexers in prod you certainly don't need ten in dev/staging. Performance-related issues won't come to light anyway, so all you're making is hot air and profit for your data center.
However, if you have an indexer or search head cluster it might be a good idea to have a staging indexer or search head cluster as well... if only to get your deployment process straightened out before going to prod.
true, true +1
Thank you all for the great information and feed back on the question I've had. I guess my only next question is - is this a good approach for what we are trying to do or has other people done it a different, more efficient way?
I've had customers who have done full blown duplicate environments (unnecessary in my opinion) and I have a couple of customers, who allow their admins to work from their laptops (local Splunk, pointing to prod indexers) and while I question the wild west approach of something like that... it's their preference. Most important is that you help people understand "how" to work and take best practice approaches of developers... have good naming conventions, separate the work out on dev with apps (folders) and use something for version control... (git hub repository is a great tool for that if Splunk sits in a place where people don't usually do that stuff). You've got a very good plan. Don't worry so much about it... just implement lightly and you'll figure out if it works. Splunk is a living, breathing thing. Evangelize best practices to your dev people and they won't need much.
If you intend dev to be identical to prod in terms of Splunk configuration, I'd recommend a (small) distributed setup there as well. This way you will get the split between index-time and search-time configuration right before deploying them into prod.
If the staging environment is just for validation of searches/alerts/dashboards before rolling them out in production, you can just have dedicated search head, accessing production Indexers, to be used as staging environment. Please note that since it's using production indexers, there will performance impact of usage on staging environment (search head).