Solved: With 8.1.2 or later should we still use an HEC tie...

twinspop · ‎03-05-2021

I read that in 8.1.2 it's less painful to update HEC configs, no longer requiring a restart for CRUD operations. Should I keep my HEC on HWF or move it directly to indexers?

twinspop · ‎03-05-2021

We've always had a physical load balancer (F5) handling our HEC load balancing duties. But behind that we've tried 2 scenarios.

Originally, we were direct to indexers back in the 6.x and 7.x days. When a user would "overlog" this would cause problems on the indexers. The solution was to disable the token in play, but doing so through config files required a restart of the cluster. Likewise, any CRUD to HEC tokens would also require a cluster restart.

To address this major issue we installed a set of VMs to sit in front of the indexing tier acting as Heavyweight Forwarders (HWFs). All HEC deliveries happen there, and they forward to the indexing tier. This seemed like a win win. I could perform token management easily without interrupting the much more vital indexing services. And the indexers were isolated from aggressive loggers.

However, there were drawbacks lurking under the surface. Performance under heavy load was still not great. Due to the way HWFs "bake" the data, it's actually MORE data being delivered to the indexers. And it bypasses the normal queuing process, jumping ahead. This can cause perf issues not seen in the optimized, normal stream processing of data. And in higher volume scenarios we'd see the indexers queues backing up eventually to the point of impacting HEC queues as well. You also have the "lensing" effect to account for. The HECs act as concentrators, delivering to 1 or 2 indexers at a time (depending on your pipeline count). If many HECs happen to land on one indexer, they could Real Genius -style laser beam it out of existence. Finally, managing another set of .conf files was a not insignificant layer of additional complexity I really didn't need.

8.1.2 brought good news: CRUD operations on HEC inputs no longer trigger a rolling restart on indexers. As this was one of our primary reasons for the HEC tier, it seemed obvious to question the need. After some testing, we removed the HEC tier from one of bigger environments, having the LB deliver direct to indexers. I've seen fewer queue spikes, and literally no negative impact on indexer load, and even hints of exactly the opposite. The environment footprint was reduced by 20-some instances, and complexity is reduced. Now *that* is a win-win. 🙂

(FWIW, for both the HEC farm and the indexing tier, we have persistent queue enabled on every token.)

EDIT: typos

View solution in original post

twinspop · ‎03-05-2021

We've always had a physical load balancer (F5) handling our HEC load balancing duties. But behind that we've tried 2 scenarios.

Originally, we were direct to indexers back in the 6.x and 7.x days. When a user would "overlog" this would cause problems on the indexers. The solution was to disable the token in play, but doing so through config files required a restart of the cluster. Likewise, any CRUD to HEC tokens would also require a cluster restart.

To address this major issue we installed a set of VMs to sit in front of the indexing tier acting as Heavyweight Forwarders (HWFs). All HEC deliveries happen there, and they forward to the indexing tier. This seemed like a win win. I could perform token management easily without interrupting the much more vital indexing services. And the indexers were isolated from aggressive loggers.

However, there were drawbacks lurking under the surface. Performance under heavy load was still not great. Due to the way HWFs "bake" the data, it's actually MORE data being delivered to the indexers. And it bypasses the normal queuing process, jumping ahead. This can cause perf issues not seen in the optimized, normal stream processing of data. And in higher volume scenarios we'd see the indexers queues backing up eventually to the point of impacting HEC queues as well. You also have the "lensing" effect to account for. The HECs act as concentrators, delivering to 1 or 2 indexers at a time (depending on your pipeline count). If many HECs happen to land on one indexer, they could Real Genius -style laser beam it out of existence. Finally, managing another set of .conf files was a not insignificant layer of additional complexity I really didn't need.

8.1.2 brought good news: CRUD operations on HEC inputs no longer trigger a rolling restart on indexers. As this was one of our primary reasons for the HEC tier, it seemed obvious to question the need. After some testing, we removed the HEC tier from one of bigger environments, having the LB deliver direct to indexers. I've seen fewer queue spikes, and literally no negative impact on indexer load, and even hints of exactly the opposite. The environment footprint was reduced by 20-some instances, and complexity is reduced. Now *that* is a win-win. 🙂

(FWIW, for both the HEC farm and the indexing tier, we have persistent queue enabled on every token.)

EDIT: typos

With 8.1.2 or later should we still use an HEC tier on HWFs?

heavy forwarder

HTTP Event Collector

intermediate forwarder

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM