Out of curiosity, could folks give an estimate as to the maximum sustained throughput they have observed by a forwarder when the forwarder's maxKBps limit is uncapped, and the forwarder is configured to use useACK. If possible, please also include estimated system specs of source/destination, and other observations -- like if cpu was capped on either, or any network constraints.
In our situation, we're seeing roughly a max 'group=thruput, name=thruput, instantaneous_kbps=5979.775845'. Source and Dest are connected on local 1g network, no excessive CPU use, no interface packet errors or loss on either. Trying to get a feel for whether 100mbit+ speeds on a single forwarder are known to be possible with useACK and i need to begin troubleshooting elsewhere..
In my testing with Cisco ASA data, I don't see much of a difference with useACK on or off. I get about 12.5 MB/s on modest hardware with useACK enabled and maybe 5% better with useACK disabled. That's with a Universal Forwarder and indexer both running 6.4.0 on separate RHEL7 VMs. Those VMs each have 8 2GHz cores from a Xeon E7-4850 and I'm almost certain that I would need a faster CPU to get more throughput out of them. The indexer is using between 4-5 cores in this configuration which I believe is essentially its maximum with a single indexing pipeline. The forwarder uses a tiny fraction of a core.
You mentioned that the maxKBps limit is uncapped which implies that you're using a Universal Forwarder. That's good; there might be a way to tune a heavy forwarder to match the throughput of a UF, but all I've seen is slower forwarding and higher CPU cost on the indexer and forwarder. I've also observed a big speed penalty when using SSL on a heavy forwarder.
All of that said, if there's no simple optimization available and you're on at least 6.3, maybe enabling multiple pipeline (http://docs.splunk.com/Documentation/Splunk/6.4.3/Indexer/Pipelinesets) will get you what you need. The catch is that if you're currently reading a single file you'll need to find a way to break it up into at least 2 separate files to boost your performance. You'll also end up using more cores but it sounds like you have them.