I want to run Splunk as a Shared Service, where different teams would have their share of the license and their own index. I believe there are two ways to set it up:
Have one Splunk server installation with Splunk forwarders all indexing data into a single pool. With this approach, I can have a scheduled search that checks how much data was consumed by each index, send out alerts, and possibly shut down the port that receives data from forwarders that belong to the team that has violated (or is close to violating) their capacity.
Install a new Splunk server for each team and thus create license pools for each team. This will make license usage clearly visible on the License Page. And if one team violates continuously, only their pool will be effected, while others will be safe. However, the down-side of this approach is running (and maintaining) several Splunk Servers. What if I have 10 teams using my Splunk? Is it really a good idea to have 10 Splunk Servers running?
So I am looking for some suggestions on the best way to design this setup. Is there some recommended standard? Is one way clearly better than another? Is there a third option that I am not aware of?
First off, you are very smart for asking for advice! Your sales guy can give you a lot of documentation for setting up Splunk as a shared service, including best practices and such that will get you on your way. Ask for it, treat it as gospel. In saying that, here are my thoughts/experiences.
I manage the Splunk instance for my company, we have a shared service model. We have 62 indexers and 8 search heads (clustered) - all physical BL460's and 660's, indexing about 3.5TB/day. This is a single deployment, servicing more than 1500 end users at 165+ facilities.
To be honest, neither of the options are very optimal if you plan on growing your deployment as a service model at your company. First things first, you need to make sure you have plenty of excess license available for new initiatives, that way you aren't reaching out to your sales guy every couple of weeks for a temp key to prove out POC's for departments desperately trying to adopt Splunk at your company. Bottom line, there is nothing we have found that can touch Splunk for what it does. That being said, if you set up your shared service model correctly Splunk will be adopted like wildfire and you will be a hero. Per-purchase the extra license. Setup a chargeback model with some extra added in so that once you have used about 75% of your license - your company will have money in the cookie jar to purchase at least the amount of license and infrastructure used since the last purchase and some. In the very likely event you go over your license 5 times in a 30 day period - Splunk will not abandon you. Keep in regular contact with your Splunk Sales Engineer (SE) and Sales guy, keeping them apprised of your initiatives and struggles, they will do everything in their power to help you. It is in Splunk's best interest that your company's deployment be successful.
Option #1 has the best start, IMO. If you do this, you will need a dedicated team of at least 2 people doing only Splunk administration tasks. When I say administration, I mean folks that are literally combing through the data - constantly looking for "junk". It will also be their responsibly to validate every input that it setup and confirm that it follows the best practices established by your Splunk COE. Once a department/team creates a dashboard set (application), they will also need to validate all the searches created and optimize them so that a single department doesn't do something like create a search that is actually 15-20 sub-searches for all-time (I have seen this first hand, it will bring Splunk to its knees). Having a couple really good engineers (get them proper training) is not an option, it is a necessity. Even if you are paying these folks $100K+ per year, you will quickly find they are worth their weight in gold. Once properly trained, they will be able to take a TB of data and reduce it to 1/4 of that without sacrificing potential "crucial" data, saving you at least $750K in licensing and a minimum of 3 8-core/48GB RAM physicals and around 400GB/day of disk space. Assuming CAPEX and OPEX cost at my company with a 365 day retention policy, this single project having trained (skilled) staff will save you $4.6M, after their salary. In time, these folks will eliminate any threat of exceeding license, making an application or department sending too much data an alert-able/actionable event - there is something wrong with their system (this is the data you want in Splunk, it will lead you to a root cause).
On to another necessity, the COE. At our company, the COE is made up of directors, managers, and key stakeholders for the initiates that have a vested interest in the health of the Splunk deployment. We meet at least once a month to discuss all new initiatives to be started on within the next 2 months and discuss any issues that might be creeping up from rapid adoption. Having a centralized deployment to support a shared service for an entire enterprise is a lot of work and it will take a few months to get everyone on the same page, I can almost guarantee it will initially be painful and you will run into several growth issues - such as folks exceeding their license. For any pain you might incur during the initial few months, it is worth it in the end. You will find, as we have, that the once "single glass of pain" can truly be a single pane of glass - empowering you, your leadership and key stakeholders to make data-driven decisions in minutes as opposed to hours or days.