What are the advantages of using the Splunk HEC JS...

Graham_Hanningt · ‎07-31-2023

More specifically: when the incoming events are already in JSON format; just, not the HEC-specific JSON structure?

In my case, each event is represented by a JSON object with a "flat" structure (no nesting): just a collection of sibling key/value pairs. This "generic" JSON can be ingested by numerous analytics platforms with minimal configuration.

I've configured a sourcetype in props.conf and transforms.conf to ingest events in this JSON structure, including timestamp recognition and per-event sourcetype mapping (that is, dynamically mapping each event to a more specific sourcetype based on two values in the event).

I use that sourcetype configuration for the following Splunk inputs:

TCP
HEC raw endpoint (services/collector/raw)

I could modify this JSON to meet the HEC-specific structure required by the HEC JSON endpoint (services/collector). I understand the HEC-specific structure and the changes that I need to make.

However, before I do that, I thought I'd ask: what are the advantages of using the HEC JSON endpoint versus the HEC raw endpoint?

I anticipate that answers will make the point that Splunk ingestion is more streamlined, because you don't need to configure, for example:

Timestamp recognition: you specify time as a metadata key
Per-event sourcetype mapping: you can specify sourcetype as a metadata key

However, from my perspective, this is simply shifting compute costs upstream. That is, I would have to perform additional upstream processing to modify the existing "generic" JSON.

Given this context, what do I gain by using the HEC JSON endpoint?

I understand that HEC indexer acknowledgment is available via both endpoints. Am I missing something?

Graham_Hanningt · ‎08-02-2023

In brief

The answer seems to be "None".

In detail

If, given the context I describe in my question, there are advantages (of using the Splunk HEC JSON endpoint versus the HEC raw endpoint), then I'd have expected experienced Splunk users, or the developers of Splunk, to reply.

While I acknowledge it would be wrong, unsafe, to assume that the absence of such replies reflects the absence of advantages, this does inform my thoughts about whether to invest effort in supporting the HEC JSON endpoint.

If someone asks me about supporting that endpoint, I'll ask them the question I've asked here ("What are the advantages... ?") and point them to this page.

PickleRick · ‎08-02-2023

Have you considered that maybe people simply missed your question? Or didn't find the time to answer it at this point? People spend their own spare time here so it's their choice which questions they answer and when. Drawing any conclusiin from the single fact of existence or not pf an answer is a bit... unwise.

Anyway. While the /raw endpoint doesn't differ really from pushing the events via tcp:// input, the /event endpoint lets you specify metadata field values explicitly. It means that you can skip time parsing (no, that's not just moving processing upstream; it's also making sure that you don't have to fool aroind with badly formated timestamps). You can also set specific source/sourcetype/index values explicitly and you can add more indexed fields to the data.

Graham_Hanningt · ‎08-03-2023

Hi @PickleRick,

Thanks very much for your reply.

You wrote:

Have you considered that maybe people simply missed your question?

Yes.

After waiting a few days for a reply, I thought I'd keep this question alive by adding my own, deliberately attempting to encourage (goad? 😉) others.

People spend their own spare time here so it's their choice which questions they answer and when.

Yes, understood.

Drawing any conclusiin from the single fact of existence or not pf an answer is a bit... unwise.

Yes, I agree. As I wrote: "wrong, unsafe".

the /event endpoint lets you specify metadata field values explicitly.

Thank you!

I'd completely overlooked the HEC services/collector/event endpoint; I'd only looked at the "raw" (services/collector/raw) and "JSON" (services/collector) endpoints.

From the Splunk docs (not news to you):

Requests containing the "fields" property must be sent to the /collector/event endpoint, or else they aren't indexed.

Interesting. I need to do more reading, think about this.

It means that you can skip time parsing; no, that's not just moving processing upstream

Yes, you have a point.

As you say, It means I can skip this (props.conf stanza from my current configuration):

TIME_PREFIX = \"time\":\"
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%:z

Some more detail. In my case, the current JSON contains timestamps as strings in this ISO 8601 format:

"2021-10-13T08:00:13.999999001-04:00"

I acknowledge this is my problem, but I'd need to do work to represent these timestamps in "UNIX time format" (e.g. 1634126413.999). That's what I meant here by "upstream processing": yes, I can skip time parsing in Splunk, but I'd need to do that work upstream.

Thanks again for your reply, much appreciated.

What are the advantages of using the Splunk HEC JSON endpoint versus the HEC raw endpoint?

HTTP Event Collector

JSON

In brief

In detail

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio