Getting Data In

How do I troubleshoot why indexing performance is slow?

cdevoe57
Explorer

Operating System: Oracle Linux 3.8.13-55.1.5, 64 bit
2 CPUs, total of 40 cores, 128 gb memory, 1 GB network, 6 300 GB, 15K SAS drives

The only thing running on this system is Splunk Enterprise. Indexing Performance is 250KB/S approximately 20GB per day. According to the Capacity Planning Documents, this system should easily handle 250 GB per day. The files are JSON files.

Memory Usage and CPU usage are WAY LOW.

What could be slowing down the indexing?

cdevoe57
Explorer

We did an interesting thing. We moved the json files to a directory that is directly accessible by the splunk instance doing the indexing. Rates went to over 20KB/s. This tells me it is an issue with the universal forwarder. We are running these in batch mode to index the files then delete them.

0 Karma

emiller42
Motivator

First thing I would look at is the state of the indexing queues.

index=_internal host=YOUR_INDEXER sourcetype=splunkd component=Metrics  group=queue  (name=aggqueue OR name=splunktcpin OR name=parsingqueue OR name=typingqueue OR name=indexqueue) 
| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) 
| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) 
| eval fill_perc=round((curr/max)*100,2) 
| timechart p90(fill_perc) by name

Change up metrics (median, max, p90) to see where things are falling for each queue. If any of them are consistently high, you've got a bottleneck in the indexing pipeline. For details of what each pipeline does, check the community wiki. That can help you dig further into root cause.

Splunk also reports if a queue is blocked in those events (blocked=true) so you can just search for that to see if you have any.

index=_internal host=YOUR_INDEXER sourcetype=splunkd component=Metrics  blocked=true

Since you're talking about JSON files, I'd wonder how big they are, and if the bottleneck is actually on the forwarders. Parsing configs can make a BIG difference in indexing performance, especially for structured data. If props.conf on your forwarder has INDEXED_EXTRACTIONS=JSON set, then a majority of the legwork to index that data is actually happening on the forwarder, not the indexer, meaning the forwarder could be the bottleneck. (If you're forwarding _internal data from your forwarders, you can check their queues using a search similar to the above)

0 Karma

cdevoe57
Explorer

When I run that Query I get all 0.0. There are no blocked events.

Is there a way to force Splunk to use more cores?

I just believe there is a configuration setting somewhere slowing things down.

0 Karma

emiller42
Motivator

There is a way to use more cores by adding parallel indexing pipelines. But if your queues are empty, that won't make any difference. I typically see indexers saturate 5 cores when fully loaded on indexing. (Processing about 20MB/sec) If the problem were with your indexer, you'd be seeing one of those queues as a bottleneck for the rest of the pipeline. I would suspect something going wrong at the input layer. Check queues and look for ERROR/WARN messages on your forwarders. (The queues you care about there are parsingqueue and tcpout_*)

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!