Getting Data In

Why does the indexing rate oscillate between few KBs per second to few MBs per second?

Explorer

I deployed Splunk Enterprise edition 7.2.3 and gave it 1 TB data for indexing. The data is available locally. Initially, when the queues(parsing, merging, typing and indexing) are empty, I am getting an index rate of ~2MB per second. But as the queues get filled, the indexing rate drops to a few KBs per second. But from there on the indexing rate keeps increasing and dropping. Also, sometimes, indexing doesn't happen at all when the parsing queue and merging queue are full. Queue snapshot

  1. Is this behavior expected?
  2. How do we achieve consistent indexing rate?
  3. Is increasing parsing queue size a solution? But that will also get filled soon.

Note: 20GB license is also added.

0 Karma

Ultra Champion

No, this is not expected.

If your queues are filling up you have serious problems.
I suspect you have issues ingesting the data - probably due to event breaking and timestamp extraction.

Have you looked in your internal logs to see if anything is reported?

index=_internal sourcetype=splunkd source=*splunkd.log log_level=WARN or log_level=ERROR

0 Karma

Explorer

Oh.. Okay. But won't queue get filled up if the incoming rate is more than indexing rate?

I am sharing the screenshot of the search query and health error also. Let me know if you have any more queries.

Error: https://ibb.co/ZmNtRX7
Search query: https://ibb.co/wW6dZB5

0 Karma

Ultra Champion

In an ideal world all the queues should be at 0% - if any of them are more than a few % it indicates problems.

Because its the early queues that are full, it suggests the bottleneck is not with the actual indexing/writing to disk.

0 Karma

Explorer

Okay. But in what scenarios can the initial queues be full and indexing queues empty?

0 Karma

Ultra Champion

for the reasons i mentioned above - probably timestamp extraction, or line breaking.

Have you checked in your internal logs?

0 Karma

Explorer

I think you are right. I checked splunkd.logs.
It is scattered with messages "WARN AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded". Found a possible reason here: https://answers.splunk.com/answers/141721/error-in-splunkd-log-breaking-event-because-limit-of-256-h....

Correct me if I am wrong here. I think the data itself causing the issue. The parser is spending too much time breaking the event that it is not able to send data to the indexing queues and also leading to the parsing queues being filled up.

As suggested in one of the comments in the above link, will changing the MAX_EVENTS=10000 resolve the issue?

0 Karma

Ultra Champion

Exactly the cause, but changing the limits will make your problem even worse!
Don't do this!

You need to fix the breaking issue by applying the correct settings in props.conf for your log format.

0 Karma

Explorer

Also from the logs, it seems like those warnings crops up only for .gz files. What I think is that Splunk is not decompressing it and taking the compressed content as one single event. But shouldn't Splunk take care of uncompressing the data before indexing it? Do I need to specify in the data input type as well? But the input data is a mixture of compressed logs and text log files.

0 Karma