Getting Data In

Splunk Infrastructure: Is it possible to have a staging environment search head that is tied to production indexers to feed staging data for accurate testing?

Path Finder

Hello all,

We are planning out the infrastructure for Splunk at my company and I have a quick question if something can be done. We are trying to bring up a dev, stage, and production environment.

  • dev being a single server for development only purposes using log samples and made up data for testing logic on dashboards.
  • production environment would ideally be a clustered set up containing for example 4 indexers and 1-2 search heads with forwarders feeding data to the indexers.
  • staging environment is where the question is. Would it be possible to have a staging search head only that is tied to the produciton indexers to feed staging data for accurate testing?

Sorry for the long post, just trying to explain our vision currently - and also taking recommendations on what you guys have done with infrastructure and prevent double indexing as best as possible.

1 Solution

Splunk Employee
Splunk Employee

dev being a single server for development only purposes using log samples and made up data for testing logic on dashboards.
For what you describe, you would want a stand alone instance of Splunk (search head and indexer in one) - where you load in your sample data... (grab a week, a month, a year of logs and point the instance at it) or test other kinds of inputs... that instance can also have a forwarder pointed at it (for testing) and act as a deployment server. When you test each project, you'll want to create a barebones app, and be sure to adjust permissions to App Only. This way, you can create your data onboarding, or index time stuff as well as your search time stuff (or knowledge objects) and the configuration files will sit all tidy in the $SPLUNK_HOME/etc/apps/yourtestapp/ directories, making it easy to push to production when you're ready (or package to ship to indexer/search head/forwarder as required.

This environment won't touch the production indexers because the dev process tends to start with a lot of wildcard and "all time" or "lots of time" sloppiness and you don't want any of that to effect performance on production indexers.

Something like a dashboard, would then be staged against production indexers as follows.

staging environment is where the question is. Would it be possible to have a staging search head only that is tied to the produciton indexers to feed staging data for accurate testing?

What you might want to do here, is as you said... a separate search head that is directed to use distributed search of your production data. Staging would be for "search head stuff" since anything going on the indexer would be tested in dev and then involve index destinations, line breaking etc. if you've tested things like data onboarding in dev on "staged" versions of real data... you can feel comfortable applying them to prod. Stage set up with a stand alone search head will enable you to test your searches, dashboards, knowledge objects etc... without stealing cores from your production system. It should all be optimized (in dev) before moving to stage so you don't have to be concerned about resource hits on the production data.

because the stage search head isn't indexing... you can safely license that as a slave.

In the case of Dev, you are going to eat up some index license. If you are concerned about lots of indexing (I don't know what your license capacity will be or what your dev plans involve) you can slice off a bit of license in to a separate license pool. If the Dev box causes violations it won't effect the main pool...

Both of the comments from somesoni2 and martin_mueller add additional tidbits as they have also had this discussion with folks on the subject lots of times. I just wanted to kind of spell out where the performance hits and resource stuff was in relation to what you've proposed and give some context. The only thing I don't completely agree with is the need for a distributed environment for Dev or any kind of mirroring of your environment. You'll never fully be able to test an identical environment (thruput... indexers at full blast etc... ) so it tends to be a bit of a moneypit rather than something truly useful. Splunk development is really all about the pieces of your apps and whether you have runaway regexes and having something that you can restart over and over again as you load data, realize you've missed the boat... and stop/clean index /start again...

Just my two cents.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

View solution in original post

Motivator

I've configured a similar environment similar to what you've asked for.

We have a testing/dev indexing environment. This is where we fix parsing etc.
We have a testing/dev search environment. Creation of dashboards, field extractions etc. These can search both the dev indexing environment AND production data using a special configuration Testers can see what their dashboards will look like against production data.
Production only search environment. These search heads only look at production data so not to confuse end users.

We use all the same index names in both environments as having the ability to see both at the same time would cause issues. ie. potentially duplicate data, and/or incorrectly parsed data.

These is a simple way around this however.

How I did it.

It was all done using a very simple option called "srchFilter" in authorize.conf.

In a search head cluster we define members that will search our development indexers or production indexers. On each one of these we define a srchFilter such as "srchFilter= splunk_server=development-peer*" or "srchFilter= splunk_server=production-peer*". This filter is prepended to every search from this search head. By having a different filter per search head you can limit the data each one can see.

We then use F5 vip addresses to redirect users to either a "development only" or a "production only" search head based on url. As they are all in the same cluster that will have the same dashboards field extractions etc so a developer can quickly see the effect of any changes on production data prior to users seeing this on production search heads.

Once they have confirmed that it looks good on the production only search head they can deploy their specific configurations into the production environment for end users to see.

Builder

Are you using only 2 search heads in the testing/dev cluster ? without the load balancer ?

0 Karma

Motivator

Can't use 2 search heads in a search head cluster environment.

Tip: you can have multiple instances on a single machine to bring the numbers up 🙂

0 Karma

Builder

Lucas, the above example you mentioned of having 2 nodes in cluster in your dev environment. how was it done with 2 nodes ?

0 Karma

Motivator

I don't think I said anywhere the specific number of nodes I had in my clusters. Can you show/quote me on what I said as I can't find it.

0 Karma

Builder

Ok i got mixed up here. So how was your setup, how many nodes did you have.
if i have 4 nodes, can i point 3 to dev indexer and 1 to prod indexer while in cluster.
plus , we need a load balancer.

im looking to have a good dev setup in my environment which can server the purpose of testing with test data and developing reports + testing with prod data.

0 Karma

New Member

I'm trying to implement the same thing for a client. They have a huge production environment several heavy forwarders multiple peer indexers but it's not in a cluster setup. Did I mention this is production. They setup the same index names in a distributed search environment
They want a new testing.dev indexing environment and possibly a way to point some of the qa/dev/staging dashboards to this environment. Is this possible

0 Karma

Motivator

You can still do it without using clustering (it is actually easier).

0 Karma

Path Finder

Thank you guys for such great input!

0 Karma

Splunk Employee
Splunk Employee

dev being a single server for development only purposes using log samples and made up data for testing logic on dashboards.
For what you describe, you would want a stand alone instance of Splunk (search head and indexer in one) - where you load in your sample data... (grab a week, a month, a year of logs and point the instance at it) or test other kinds of inputs... that instance can also have a forwarder pointed at it (for testing) and act as a deployment server. When you test each project, you'll want to create a barebones app, and be sure to adjust permissions to App Only. This way, you can create your data onboarding, or index time stuff as well as your search time stuff (or knowledge objects) and the configuration files will sit all tidy in the $SPLUNK_HOME/etc/apps/yourtestapp/ directories, making it easy to push to production when you're ready (or package to ship to indexer/search head/forwarder as required.

This environment won't touch the production indexers because the dev process tends to start with a lot of wildcard and "all time" or "lots of time" sloppiness and you don't want any of that to effect performance on production indexers.

Something like a dashboard, would then be staged against production indexers as follows.

staging environment is where the question is. Would it be possible to have a staging search head only that is tied to the produciton indexers to feed staging data for accurate testing?

What you might want to do here, is as you said... a separate search head that is directed to use distributed search of your production data. Staging would be for "search head stuff" since anything going on the indexer would be tested in dev and then involve index destinations, line breaking etc. if you've tested things like data onboarding in dev on "staged" versions of real data... you can feel comfortable applying them to prod. Stage set up with a stand alone search head will enable you to test your searches, dashboards, knowledge objects etc... without stealing cores from your production system. It should all be optimized (in dev) before moving to stage so you don't have to be concerned about resource hits on the production data.

because the stage search head isn't indexing... you can safely license that as a slave.

In the case of Dev, you are going to eat up some index license. If you are concerned about lots of indexing (I don't know what your license capacity will be or what your dev plans involve) you can slice off a bit of license in to a separate license pool. If the Dev box causes violations it won't effect the main pool...

Both of the comments from somesoni2 and martin_mueller add additional tidbits as they have also had this discussion with folks on the subject lots of times. I just wanted to kind of spell out where the performance hits and resource stuff was in relation to what you've proposed and give some context. The only thing I don't completely agree with is the need for a distributed environment for Dev or any kind of mirroring of your environment. You'll never fully be able to test an identical environment (thruput... indexers at full blast etc... ) so it tends to be a bit of a moneypit rather than something truly useful. Splunk development is really all about the pieces of your apps and whether you have runaway regexes and having something that you can restart over and over again as you load data, realize you've missed the boat... and stop/clean index /start again...

Just my two cents.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

View solution in original post

SplunkTrust
SplunkTrust

A little background into why I'd consider a small distributed setup even for dev/staging: When I'm onboarding a new "thing" into Splunk, there's the whole range to consider: Indexes, roles for the "thing" searching those indexes, index-time props.conf/transforms.conf, search-time props.conf/transforms.conf, saved searches, dashboards, etc.
If I put all that into a single app on a standalone dev/staging Splunk I'll then have to go in and cut up my app into indexer and search head apps - no fun to do, and quite error-prone. Instead I start out with the split already built into the dev/staging process, making transporting the apps (shameless plug: https://splunkbase.splunk.com/app/2613/ :p) very simple.
Additionally, some things behave differently. For example, if some saved searches or dashboards contain rest calls you will have to think about who to query in a distributed environment - if you develop that on an all-in-one splunk you won't have prepared the correct query for your prod environment.

I fully agree that it's adding some cost to your Splunk operation... in my experience it's worth it, your mileage may vary.

Splunk Employee
Splunk Employee

in a pinch... I think it could be simulated by cutting up the app into different apps, kind of in TA/DA/SA style... there is certainly nothing wrong with a fully functional exact duplicate... it just depends also on the sophistication of what you're developing, especially if there are roles in play etc. BTW I LOVE App Exporter. 🙂

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

SplunkTrust
SplunkTrust

The trouble with cutting up apps while on a single instance is that even if you make a mistake and misplace a config it'll still work... until you transport to prod, then it goes boom.

I'd cut corners with the exact duplicate. If you have, say, ten indexers in prod you certainly don't need ten in dev/staging. Performance-related issues won't come to light anyway, so all you're making is hot air and profit for your data center.
However, if you have an indexer or search head cluster it might be a good idea to have a staging indexer or search head cluster as well... if only to get your deployment process straightened out before going to prod.

Splunk Employee
Splunk Employee

true, true +1

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

Path Finder

Thank you all for the great information and feed back on the question I've had. I guess my only next question is - is this a good approach for what we are trying to do or has other people done it a different, more efficient way?

0 Karma

Splunk Employee
Splunk Employee

I've had customers who have done full blown duplicate environments (unnecessary in my opinion) and I have a couple of customers, who allow their admins to work from their laptops (local Splunk, pointing to prod indexers) and while I question the wild west approach of something like that... it's their preference. Most important is that you help people understand "how" to work and take best practice approaches of developers... have good naming conventions, separate the work out on dev with apps (folders) and use something for version control... (git hub repository is a great tool for that if Splunk sits in a place where people don't usually do that stuff). You've got a very good plan. Don't worry so much about it... just implement lightly and you'll figure out if it works. Splunk is a living, breathing thing. Evangelize best practices to your dev people and they won't need much.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

SplunkTrust
SplunkTrust

If you intend dev to be identical to prod in terms of Splunk configuration, I'd recommend a (small) distributed setup there as well. This way you will get the split between index-time and search-time configuration right before deploying them into prod.

0 Karma

Revered Legend

If the staging environment is just for validation of searches/alerts/dashboards before rolling them out in production, you can just have dedicated search head, accessing production Indexers, to be used as staging environment. Please note that since it's using production indexers, there will performance impact of usage on staging environment (search head).

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!