Solved: Does HEC provide guaranteed delivery, and if not, ...

hrottenberg_spl · ‎10-09-2019

I want to ensure that the messages sent to HEC make it to Splunk. What are my options?

hrottenberg_spl · ‎10-09-2019

HEC is at the end of the day, a web server. For the most part, it only acts as a simple listener of HTTP(S) POST messages, and as such, it does not guarantee that a sent message was received. The sender is expected to use HTTP protocol conventions such as the status code to determine if a message was received, and if not, do whatever it is that the sender wants to do. All of this is by design, and it works exactly as a web server is expected to work.
One exception to the above is the HEC Indexer Acknowledgement feature, which requires the developer to implement client-side code. More details are here: https://docs.splunk.com/Documentation/Splunk/7.3.2/Data/AboutHECIDXAck

If “guaranteed messaging” is a requirement, I suggest looking at one of these options:

(In AWS) Kinesis Data Firehose which has a reliable pipeline to Splunk Enterprise or Splunk Cloud (behind the scenes, the nozzle uses HEC with the indexer acknowledgement feature enabled)
(Not AWS) Use a any queueing system which provides a guaranteed delivery feature. Then you would need to use another process to dequeue those messages and send to Splunk.
Use a logging library (e.g. log4j) which supports multiple destinations, and add a disk buffer as a fallback.

To address momentary connection issues, Splunk has added logic to a handful of client libraries and other tools that will handle things like: batching, buffering, and retrying. Those resources are published at http://dev.splunk.com/view/SP-CAAAFD2, and are available as open source. Realize that these clients will typically use a small (configurable) memory buffer which can quickly be filled (depending on the size of the buffer and the length of the outage).

The available logging libraries are:

Java
Javascript
.NET

We also provide full open source SDKs which include logic for many tasks outside of logging, as well as code samples for these languages:

C#
Java
Javascript
Python

Lastly, the Docker Logging Driver does include batch, buffer, and retry logic. By default, it will buffer 10,000 lines, as documented here: https://github.com/splunk/docker-logging-plugin#advanced-options---environment-variables

View solution in original post

hrottenberg_spl · ‎10-09-2019

HEC is at the end of the day, a web server. For the most part, it only acts as a simple listener of HTTP(S) POST messages, and as such, it does not guarantee that a sent message was received. The sender is expected to use HTTP protocol conventions such as the status code to determine if a message was received, and if not, do whatever it is that the sender wants to do. All of this is by design, and it works exactly as a web server is expected to work.
One exception to the above is the HEC Indexer Acknowledgement feature, which requires the developer to implement client-side code. More details are here: https://docs.splunk.com/Documentation/Splunk/7.3.2/Data/AboutHECIDXAck

If “guaranteed messaging” is a requirement, I suggest looking at one of these options:

(In AWS) Kinesis Data Firehose which has a reliable pipeline to Splunk Enterprise or Splunk Cloud (behind the scenes, the nozzle uses HEC with the indexer acknowledgement feature enabled)
(Not AWS) Use a any queueing system which provides a guaranteed delivery feature. Then you would need to use another process to dequeue those messages and send to Splunk.
Use a logging library (e.g. log4j) which supports multiple destinations, and add a disk buffer as a fallback.

To address momentary connection issues, Splunk has added logic to a handful of client libraries and other tools that will handle things like: batching, buffering, and retrying. Those resources are published at http://dev.splunk.com/view/SP-CAAAFD2, and are available as open source. Realize that these clients will typically use a small (configurable) memory buffer which can quickly be filled (depending on the size of the buffer and the length of the outage).

The available logging libraries are:

Java
Javascript
.NET

We also provide full open source SDKs which include logic for many tasks outside of logging, as well as code samples for these languages:

C#
Java
Javascript
Python

Lastly, the Docker Logging Driver does include batch, buffer, and retry logic. By default, it will buffer 10,000 lines, as documented here: https://github.com/splunk/docker-logging-plugin#advanced-options---environment-variables

Does HEC provide guaranteed delivery, and if not, what are my options?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk