How much stored data can a Splunk indexer comfortably manage? I know that the answer depends on the indexer hardware and its workloads, etc. Assume an indexer that is based on the Splunk "reference server" with 800-1000 IOPS, and a good index design. Disks can be assumed to be modern (2014+) SSD drives, price consistent with the other reference hardware. Also, assume "typical light searching" by 2-3 users.
Just to be clear: this is not a question about inbound data per day. This is about how quickly Splunk can search the on-disk indexed data, and therefore how much data should be stored per indexer.
I can't find anything about this in the docs or online.
What is your opinion? What is your experience?
As Rich7177 says, this is a hard question to answer. We love to say it depends on use cases, because, well it does.
So there are a few things to think about when talking about local disk (and search) requirements. First, and typically before we even consider search types; retention requirements and ingestion volumes.
This will most likely provide the real answer to your question in regards to how much data should be stored on the local indexers. If there is a requirements for only 1 week to 30 / 90 days for all indexes, then typically lower volume raid10 arrays with spindle based storage is more then sufficient, even considering different search loads. When you start discussing warm retention requirements of 1 / 3 / 5 years, then it gets more complicated. Most of the hardware out there in the wild for indexers will typically have 6 to 12TB of local attached disks in raid10. Rolling to frozen is usually NAS or just delete, COLD on NAS or cheaper storage somewhere.
Regarding SSD, http://blogs.splunk.com/2012/05/10/quantifying-the-benefits-of-splunk-with-ssds/ is a good article to read. I'm not encountering too many deployments yet that are using SSD for indexing volumes, mainly OS volumes.
After this is ingestion rate. With the reference hardware, a pure indexer can do upwards of 250 ~ 300gb/day of indexing. As you add additional workloads (parsing, searching) into this, that drops. So we scale horizontally.. when in doubt, add more indexers!
After this, then comes the search types discussion, to use DM acceleration, SI, etc.
There's no real 1 for 1 answer one this unfortunately, which can be quite frustrating. I recommend following our reference hardware as a framework for building the platform and go from there!
I think there's no one answer to this because there are so many variables.
For instance, exactly what you are searching - how dense or sparse the results are, how much transformation of the results take on - can vary the number of events per second (eps) achieved from less than 10k (sometimes even a lot less!) to up near 1,000,00 eps on the same hardware. I know, because I've seen both cases on my own hardware. 🙂
Honestly, too, a big variable is how much the users know and how patient they are. Users can bring ANY system to its knees, but they can also be taught the basics of keeping their searches fairly quick, or at least as quick as is possible. If they understand that sometimes they'll have an inefficient search and that it'll be a bit slow, then they can be quite tolerant of the occasional search taking a bit of time.
Really, I've never heard much sizing information except on ingest rates and retention, because that's load and capacity respectively and everything else is relative.