I assume you already are using a SHC which should help with HA. But in terms of SH saturation, there's a lot to say on this. A peer of mine pointed out:
Skipped searches or hitting of
concurrency limits are proxies for
search tier saturation but often
they’re weak indications of such at
the deployment level. Just like any
other software the first thing you
need to look at is system metric
utilization; if any of the SH system
metrics, especially CPU and RAM are
consistently exhausted you need to
consider adding a new search head. The
other half of the equation is
inspecting indexers’ workload. If they
too are consistently pegged, add
indexers as well.
Furthermore, we should remember that although the new scheduling feature of splunk should help mitigate skipped searches, continued saturation could be addressed by merely reviewing usage, moving schedule times, eliminating no longer needed search jobs, improving the performance of knowledge objects, etc... - all in addition to what my peer highlighted.
A very nice discussion at Should I increase search head specs, add a new search head, or migrate to search head clustering for...
From what rich7177 said there -
-- As it is, Splunk's "recommended" specs call for, even as virtual machines, 2x 6 core processors and 12 GB of RAM. I think those are fine minimums specs. You can often get by on less in a very small environment, but I would say that until you hit at least that level - probably double or triple that much RAM - I wouldn't even think of adding additional SHs for load. For isolation? Maybe. For redundancy? Maybe. For load? No.
-- If/when you do ES, you will add a separate SH dedicated solely for ES. Splunk Professional Services will highly, HIGHLY recommend that and may even require it. ES is very snobby and likes to be isolated and put on its own little island where it won't have conflicts with other things. And it doesn't play well with clustering/pooling on the ES SH side of things. (Indexers - sure, SHs, no).
*I know of people running 96 core, 512 GB RAM SHs. *
I personally like to monitor very closely the OS of the platform and understand well the bottle necks before making any scalability decision.