We want to share our Splunk with another team but make sure that another team does not exceed the indexing license volume that is allocated to them.
Presently, they are just doing a proof-of-concept. So when I created the new index, I've specified "Max size (MB) of the entire index" to be 200MB but they were able to go over that volume. Also, when I look at the Indexes page, "Current size (in MB)" shows that they used up only 112 MB. However, when I search the internal index to get perhost and perindex todaysBytesIndexed, they are significantly larger. And we have our first violation because of it.
So how come setting max size didn't work? And what are the steps for configuring this correctly?
I think you're mixing things.... Let's see if I can clear it for you. First, limiting the "Max Size" of an index only defines the amount of data it'll will use of your disk. If you have an index that's full or almost full and you keep sending more data to it, it'll accept and process the data but will drop/purge the oldest events, keeping the index size within the "Max Size" limit.
Now, the license works in another way, it works based on the amount of that an indexer process. For example, if you have an index with the "Max Size" of 100Mb and someone sends an 1Gb log file, the end result will be:
*please note the figures are just example to understand how it works, there are more variable in the numbers, like the data is stored compressed inside the index, but it should be good enough to understand License Usage vs Max Index Size.
Now the solution Splunk offers is to create multiple license pools, lets say, you could create a license pool of 200Mb and assign it to an specific Indexer - note that you can't assign a license pool to an index, you need to assign it to an indexer. In other words, if you want to use that feature, you'll need to install another Splunk instance, make it a Slave of your Master Splunk license server and assign the 200Mb pool to it. All data the Proof-of-concept team produces need to be sent to this "Slave" Splunk.
Here a Splunk Doc might help: http://docs.splunk.com/Documentation/Splunk/6.1.4/Admin/Groups,stacks,pools,andotherterminology
Hope it helps.
Thank you very much for clarification. I set this up the way you suggested. However, I still have one followup question. Now I have a new pool with 200MB allocated to it. Is there any way to stop indexing when this limit is reached? Or is there another way to ensure that the team doing proof-of-concept does not over-consume what is allocated to them? Maybe I should setup monitoring and when their pool is close to capacity shut down the Splunk indexer that is associated with their pool? Any other recommendations?
There is an app called S.o.S: https://apps.splunk.com/app/748/ it helps you monitoring the Splunk itself. I think the problem is that Splunk is designed to do not block data indexing, that's why it has the violation alerts instead... In a production environment blocking data indexing could prevent you to collect important information.
Anyway, have a look on this app and setup alerts! A good strategy to stop data from coming is to have multiple TCP INPUTs, so you give for example tcp port 10000 to a team, if they start to abuse, just disable temporally this input. You could even automate this process using alerting scripts.