So we have an internal load balancer that distributes HEC requests between 2 heavy forwarders. HEC is working fine and all but a small fraction of the requests are not making it to the heavy forwarders. The sender of the events get the 503 error below:
upstream connect error or disconnect/reset before headers. reset reason: connection termination
while the internal load balancer get this error:
backend_connection_closed_before_data_sent_to_client
What really baffles me is that I couldn't find any error logs in Splunk that might be connected to this issue. There's also no indication that our heavy forwarders are hitting their queue limits. I even tried increasing the max queue size of certain queues including that of the HEC input in question but even that didn't help at all. Is there any other stuff that I can check to help me pin point the cause of this problem?