Hello all Splunk Users, Architects, Support etc.
I would like to produce a license sizing calculator based on your experience.
It would be a spreadsheet with all needed calculations, you just put device type and role, quantity and it will produce license needs.
For example, I have the following table that I found somewhere on the Internet.
Quantity Type Avg. EPS 1000 Employee
Endpoints 5 5 Network Switches 150
5 Network Gateway/Router 5 5 Windows
Domain Server 35 2 Windows Application
Server 4 2 Linux Server 4 6 Exchange
Server 3 3 Web Servers (IIS, Apache,
Tomcat) 1.5 2 Windows DNS Server 1
4 Database Server 2 2 Firewall 10
2 Firewall 60 7 IPS/IDS 70 1 VPN 2
Please write down your statistics, the information needed is:
device type, quantity, average data size a day, EPS (if you want) , enterprise size in workers number.
The information could be anonymous.
The final result will be posted in Splunk's wiki.
We have some spreadsheets internally based on samples we've collected, as well as an app based upon them. Drainy's answer above is an excellent post on why you might want to be wary of samples. I also published a (long!) blog post just now on all the approaches you can take to estimate license sizes.
From previous experience, this is an impossible solution. Best way to calculate is to setup Splunk and turn on the taps for a week, although its noble of you to try 🙂
Some things you can't estimate (or could but the calculator would be so complex it would be quicker to do a PoC);
I'm not trying to be negative, its a great idea to try and do something like this but as I said, I've previously tried to build estimates but to build the estimate you need to see the natural flow of data over a week to take into account normal peaks and lows, including the number of times someone may enable debug on a switch (not to mention you may have 20 switches/routers all with different log levels).. and in the end just stuck Splunk in and worked it out in real time. This also gives you the advantage that you can enable some nullQueues and edit your inputs to trim the fat.
I agree that it would be easier to set up a test instance of Splunk and know what the data flow looks like, rather than trying to figure out all of the variables. Great description.
That's an awful lot of work that you are asking other people to do. Are you sure that it will be generally useful to everyone? Plus I am having some problems making sense of the table that you posted.