Solved: Heavy forwarder - Could not send data to output qu...

nareshinsvu · ‎08-22-2019

Below is my use-case (Heavy Forwarders -> Indexers). Need expert assessment.

1) I have very huge log files.
2) So, I have used heavy forwarders to cut down data at source using REGEX transforms.
3) Got a peak load and splunk not able to get any data .
splunkd.log on forwarders - Could not send data to output queue (parsingQueue)
metrics.log on forwarders - Metrics - group=queue, name=aggqueue, blocked=true, max_size_kb=1024, current_size_kb=1023, current_size=2324, largest_size=2336, smallest_size=2169
4) Didn't help after restart/deletion of metrics.log*

Are there any specific config changes that can fix this without changing my architecture? Else, I have to go for Universal Forwarder -> Heavy Forwarders -> Indexers. And any issues that I foresee in the new architecture?

changing interval in inputs.conf might help? currently it is set to 10. Suspecting that the log filling is very fast and it couldnt cope with the changes? Should I increase it to may be 50 and see?

nareshinsvu · ‎08-29-2019

my logfiles are complex with mixed data (JSON without timestamp and non-JSON). Splunk is parsing the timestamp for each json lines and taking more time there.

Performance got better only after removing custom time format and using DATE_CONFIG = CURRENT. AND separating JSON data using INDEXED EXtRACTIONS

Thanks for your effort in this.

View solution in original post

nareshinsvu · ‎08-29-2019

my logfiles are complex with mixed data (JSON without timestamp and non-JSON). Splunk is parsing the timestamp for each json lines and taking more time there.

Performance got better only after removing custom time format and using DATE_CONFIG = CURRENT. AND separating JSON data using INDEXED EXtRACTIONS

Thanks for your effort in this.

harsmarvania57 · ‎08-23-2019

Hi,

Can you please let us know how many queues are blocked? If Indexing queue will block then due to back pressure parsing , aggregation and typing queue will also block. Queue blocking are on Heavy Forwarders only or on Indexers as well ?

nareshinsvu · ‎08-26-2019

Hi,

It's blocking only on the HF. How can I find the # of queues getting blocked? I could see these continuous blocked messges in the metrics.log

harsmarvania57 · ‎08-26-2019

When you check blocked message in metrics.log on HF, you can see queue name in same event that which queue has been blocked. During same time if more than one queue was blocked then you'll able to see multiple events with blocked=true with their queue name.

Alternatively you can check in Monitoring Console if you have MC present & configured in your environment.

nareshinsvu · ‎08-27-2019

Only aggqueue or parsingqueue at any instance.

Strangely, this is happening only with a clustered environment (modified parsingqueue to 30MB and aggqueue to 10MB).

When I load the same large file onto a single node splunk (with default parsingqueue value of 6MB and aggqueue to 1MB), it is faster.

Realyy blowing my mind and unsure how to get it fixed in my clustered setup Any environment variables to tweak?

harsmarvania57 · ‎08-28-2019

How large your files ? Do you know average event size in that file?

As you mentioned that only parsing and aggregation queues are blocked and NOT any other queues, based on https://wiki.splunk.com/Community:HowIndexingWorks those queues are doing Line Breaking, Line Merging, Timestamp extraction & assignment. In your question you mentioned that you are using REGEX to cut down data, that goes into typing queue. Can you please double check whether typing queue is blocked in your clustered environment ? If your REGEX is not well written and using more steps then it will really impact your splunk performance on large environments or big data source.

To know more about Regex Performance you can enable Regex profiling on Heavy Forwarder

limits.conf

regex_cpu_profiling = <boolean>
* Enable CPU time metrics for RegexProcessor. Output will be in the 
  metrics.log file.
  Entries in metrics.log will appear per_host_regex_cpu, per_source_regex_cpu,
  per_sourcetype_regex_cpu, per_index_regex_cpu.
* Default: false

nareshinsvu · ‎08-22-2019

Not able to catch the root cause. One of the pages suggest to increase parsingqueue to 10mb but it will impact on memory. Did someone try that solution?

Currently Splunk is way behind and not capturing my log content with the same messages in forwarder's splunkd.log

nareshinsvu · ‎08-22-2019

Increased the parsingQueue size. I am getting the data now. BUT BUT HF's CPU spiked to 30%. Could someone confirm on best solution for my use case?

[queue=parsingQueue]
maxSize = 10MB

Heavy forwarder - Could not send data to output queue (parsingQueue)

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)

From GPU to Application: Monitoring Cisco AI Infrastructure with Splunk Observability ...

Application management with Targeted Application Install for Victoria Experience

Join the Conversation

Heavy forwarder - Could not send data to output queue (parsingQueue)

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)

From GPU to Application: Monitoring Cisco AI Infrastructure with Splunk Observability ...

Application management with Targeted Application Install for Victoria Experience