We have a fairly secure environment with no servers able to access the internet or route traffic to SplunkCloud. A large majority of the data we will be indexing is OS (*.nix, Windows etc.) and app logs. In addition we'll have some HF's DBConnect, some API, some HEC and some syslog.
In order to punch out the firewall and restricted zones we will need intermediate forwarders, or for security, gateway forwarders. I am trying to size and scale accordingly but I cannot find anything that talks about rule of thumb sizing for throughput.
Assuming a Universal Forwarder gateway forwarder providing no other function that receiving data from the internal UF's and HF's:
1. Would 12 cpu/12gb RAM and 800 IOPS be overkill?
2. Are there diminishing returns as resources are increased (e.g. a 4cpu 4gb UF can push 100GB/day but a 12/12 can only push 200GB/day)?
3. Are there limits to throughput (e.g. a UF can only do 4Mbps)?
4. I am assuming that horizontal scale is better than monolithic, but how do I know how many and what spec (assuming a 1TB/day SplunkCloud indexing).
You don't really need any IOPS unless you're doing a persistent cache since the only thing disk will be used for is writing logs. You'd be better off with a several systems than one (as you said, horizontal vs monolithic).
Adding more cores or ram in this situation won't really do much for performance. The system won't be doing anything but taking in data and spitting it back out which doesn't require much CPU.
I've seen a single UF push mid-100's of GB/day on a small system (it was either a 2 or 4 core VM). You don't need a lot of horsepower for these systems.
Make sure you disable the throughput limits on the UF. By default this is only 256kbps. Set it to 0 to push data as fast as the system can go.
One thing to look at is providing more memory for the various queues, since the default is only 512kb. Memory on the system will not be used for much, so making queues 10mb or more won't hurt system resources. Check your metrics.log for "blocked=true" and also look at "group=tcpout_connections" to see if you see issues. I'm not sure if you need to increase the # of fd (max_fd in limits.conf) - that might be something to keep track of.
You can also add a second pipeline (parallelIngestionPipelines = 2 in server.conf).
You will want more than one system since you're pushing to an unknown number of indexers on the other end. If you have just one forwarder, you'll have your data all being sent over a single connection and to only one indexer at a time. If you have, say, 10 intermediate forwarders, and say 5,000 internal hosts, then you'll have (on average) 500 hosts sending data to each forwarder. My rule of thumb has been at least one forwarder per indexer.
By doing this configuration, note that you are also limiting the quality of the data distribution on your indexers. This may not be critical to you, but be aware that it can cause issues with things like Enterprise Security.