Some architecture Q's around Splunk.
My customer is buying Splunk. They have global presence with a few data-centers. They want centralized syslog with Splunk. We are planning to put a indexer in two data-centers and use a single search head (typical distributed architecture). The indexers will sit atop of syslog-ng. Using metrics from another logging system we currently have approx 432kB/sec worth of syslog data. That said I need to know more about Splunk in virtual env and storage.
good news is, my customer is already buying Splunk, i just need to properly spec out whats needed to make it work, etc.
I have a number of customers running 100% in virtual environments, and if you adhere to the Splunk recommendations regarding sizing for VM (and nuances) you should be fine.. Your assessment for storage is correct (minimum 800 IOPS) however in shared storage environments these need to be meticulously "locked in"
As for search-heads needing "big storage" typically not, unless you will be creating lots of summary indexes.
You can deploy splunk in VMware but there are some considerations of course. There is a tech brief here that should help.
Previous answer:
As for storage, the main consideration for the indexers is IOPS. You really need fast storage so make sure you baseline this. Our minimum recommendation is 800 IOPS and this previous answer should be helpful.