Operating System: Oracle Linux 3.8.13-55.1.5, 64 bit
2 CPUs, total of 40 cores, 128 gb memory, 1 GB network, 6 300 GB, 15K SAS drives
The only thing running on this system is Splunk Enterprise. Indexing Performance is 250KB/S approximately 20GB per day. According to the Capacity Planning Documents, this system should easily handle 250 GB per day. The files are JSON files.
We did an interesting thing. We moved the json files to a directory that is directly accessible by the splunk instance doing the indexing. Rates went to over 20KB/s. This tells me it is an issue with the universal forwarder. We are running these in batch mode to index the files then delete them.
First thing I would look at is the state of the indexing queues.
index=_internal host=YOUR_INDEXER sourcetype=splunkd component=Metrics group=queue (name=aggqueue OR name=splunktcpin OR name=parsingqueue OR name=typingqueue OR name=indexqueue)
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size)
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size)
| eval fill_perc=round((curr/max)*100,2)
| timechart p90(fill_perc) by name
Change up metrics (median, max, p90) to see where things are falling for each queue. If any of them are consistently high, you've got a bottleneck in the indexing pipeline. For details of what each pipeline does, check the community wiki. That can help you dig further into root cause.
Splunk also reports if a queue is blocked in those events (blocked=true) so you can just search for that to see if you have any.
Since you're talking about JSON files, I'd wonder how big they are, and if the bottleneck is actually on the forwarders. Parsing configs can make a BIG difference in indexing performance, especially for structured data. If props.conf on your forwarder has INDEXED_EXTRACTIONS=JSON set, then a majority of the legwork to index that data is actually happening on the forwarder, not the indexer, meaning the forwarder could be the bottleneck. (If you're forwarding _internal data from your forwarders, you can check their queues using a search similar to the above)
There is a way to use more cores by adding parallel indexing pipelines. But if your queues are empty, that won't make any difference. I typically see indexers saturate 5 cores when fully loaded on indexing. (Processing about 20MB/sec) If the problem were with your indexer, you'd be seeing one of those queues as a bottleneck for the rest of the pipeline. I would suspect something going wrong at the input layer. Check queues and look for ERROR/WARN messages on your forwarders. (The queues you care about there are parsingqueue and tcpout_*)