We have seen the reference on hardware spec for performance and scaling, how about this below:
What is the difference between (lets say):
3 x servers with spec:
12 physical cores, 32GB RAM, 800 IOPS per server
ONE BIG giant virtual machine with spec:
36 physical cores, 96GB RAM, 2400 IOPS
Unless, you are dedicating resources on the virtual machine, it is unlikely that will perform as well as 3 smaller physical machines. In fact, you will incur a 10% performance penalty for indexing by simply running virtually (worse if there is contention for CPU, RAM, or disk).
Also, a single indexing pipeline would use 4 CPUs per machine, thus 3 servers would have 12 CPU worth of indexing horsepower. As of 6.3 you can have up to 2 (max recommended) indexing pipelines which would give you 8 CPU worth of indexing horsepower on the "big server", which would be less than the 3 smaller servers, but would leave more CPUs available for handling search. If that meets your needs, it's probably fine.
Personally, if I was setting this up, I would want the 3 servers with known dedicated resources, and the inherent redundancy that is associated with it. But if you are going for ease of use and simplified maintenance/administration, 1 server does fit the bill.
Info on indexing parallelization:
I should add, if you go with the larger system, your expansion options are to add another identical system. While with 3 smaller servers, you could add a single identical smaller system.
Where can I get that reference doc/link: a single indexing pipeline would use 4 CPUs per machine?
I never heard before if one indexer has maximum usability of indexing processing, or perhaps also for searching.
I'm not sure there is a reference in the docs to the 4 processor usage for indexing. However, there are four distinct queues and this process for the indexing pipeline. Each of those tends to leverage the better part of a CPU. That is the reason we say that 4 CPUs are used per pipeline.