I'm curious, given two similar hardware configurations for an index cluster is there a point at which more index nodes will make up for a slower disk subsystem?
For example, if I had to decide between designing a cluster with 7200RPM drives or with 15k RPM drives would there be a point where fewer cluster nodes with 15k drives would equal a cluster with more nodes at 7200?
Issues of MTBF aside, it gets more interesting when the servers in question are utilizing SSD drives. On the low end there are 3Gbps SATA SSD's at a reasonable price and then 6Gbps SAS drives at about twice to four times the price. When you start talking about 10 drives per server, the costs add up quickly.
Of course, there is always the concern that the RAID controller can't even keep up with the slower SATA SSD drives. Or if the SATA architecture is well designed for this type of I/O compared to SAS
So, is it worth considering fewer nodes with higher reliability, faster more expensive drives vs. more nodes with less expensive drives? In the end, the dollar cost would be the same, although total storage would be less with fewer machines.
Yes to some extent more nodes can make up for slower hardware. For many or most types of queries, Splunk will scale close to linearly with the number of nodes, especially for longer queries. But there is always going to be some overhead, and there is going to be some increase in latency.
The answer as to whether it's okay depends entirely on the particulars of exactly how many, how much slower, what the cost difference is, what you're querying, what your tolerance and threshold for slowness is, whether total time or latency is more important, etc. But certainly it can be considered.
The default recommendations are based off of "typical" usage using recent commodity hardware at "typical" relative market costs. But different people are in different situations on all factors, so those are merely rules of thumb.
With our experience with storage we started with 10-16x 15K RPM SAS drives in our indexers which depending on your indexing volume worked pretty well, however we moved our clustered indexers hot DBs over to storage on our SSD SAN arrays where we saw a pretty significant increase in performance (2x-5x). It seems to come down to how Splunk searches across the buckets, it generates random IO against the disks while looking through the indexes, with SSD storage you start to see big improvements with sub millisecond latency and high random IO that the SSDs can deliver since data is returned much quicker, even if you have a high write rate to the indexes and a lot of searches going on in the system.
One other option for you would be to put a couple SSDs into the system to handle your hot buckets then some larger 10K RPM drives in the system to handle your cold buckets. Just make sure you have enough SSD based storage to handle the majority of your searches and that you don't dip into the cold buckets as often on the slower disks (This is how we are configured).