Solved: What is included in HEC introspection data?

davidpaper · ‎05-14-2020

I'm ingesting data via HEC and I know there is data about it in _introspection, but I don't know what I'm looking at when I search for it. Here is what I know so far.

I have a HEC token named testing multiple events.

The token itself looks like f2584364-976f-4a68-ac3b-4a4d481ec8cd.

I'm searching for introspection data about HEC via index=_introspection sourcetype="http_event_collector_metrics".

Some data has been sent to it for testing.

Can someone explain what I'm seeing when looking at the entry below?

{ [-]
   component: HttpEventCollector
   data: { [-]
     format: json
     num_of_errors: 0
     num_of_events: 3
     num_of_parser_errors: 0
     num_of_requests: 1
     num_of_requests_in_mint_format: 0
     num_of_requests_to_disabled_token: 0
     series: http_event_collector_token
     token_name: testing multiple events
     total_bytes_indexed: 72
     total_bytes_received: 111
     transport: http
   }
   datetime: 05-11-2020 10:52:15.827 -0400
   log_level: INFO
}

davidpaper · ‎05-14-2020

There a few things to unpack here.

HEC data is reported in introspection every 1 minute. All discussions below assume 1 min intervals unless otherwise stated.
There are two types of introspection logs for HEC, one type that summarizes all HEC activity on the host, and one type that provides a summary for each unique token received. What's the different?

The first type example is below. Note that it does not have a token_name: field in it.

{ [-]
   component: HttpEventCollector
   data: { [-]
     format: json
     num_of_ack_requests: 0
     num_of_auth_failures: 0
     num_of_errors: 0
     num_of_events: 3
     num_of_parser_errors: 0
     num_of_requests: 1
     num_of_requests_acked: 0
     num_of_requests_in_mint_format: 0
     num_of_requests_to_disabled_token: 0
     num_of_requests_to_incorrect_url: 0
     num_of_requests_waiting_ack: 0
     series: http_event_collector
     total_bytes_indexed: 72
     total_bytes_received: 111
     transport: http
   }
   datetime: 05-11-2020 10:52:15.826 -0400
   log_level: INFO
}

The second type does have a token_name in it.

{ [-]
   component: HttpEventCollector
   data: { [-]
     format: json
     num_of_errors: 0
     num_of_events: 12
     num_of_parser_errors: 0
     num_of_requests: 4
     num_of_requests_in_mint_format: 0
     num_of_requests_to_disabled_token: 0
     series: http_event_collector_token
     token_name: testing multiple events
     total_bytes_indexed: 288
     total_bytes_received: 444
     transport: http
   }
   datetime: 05-11-2020 11:02:15.775 -0400
   log_level: INFO

And my token name now shows up. Other than the token_name, there are no other differences. Each token seen by the indexer will generate its own unique introspection log entry. So, if there are 10 unique tokens sent to the indexer, expect to see 11 introspection events (1 for each token + 1 summary). If you are summing up HEC usage data, be careful not to count the same data more than once.

format is always json
A HEC request may have or more Splunk events in it. A multi-event request is called a batch. num_of_events is a sum of all Splunk events received by the indexer.
num_of_requests is how many individual requests HEC requests the indexer received.
total_bytes_indexed is how much data was ingested via HEC
total_bytes_received is how much data was received by HEC, including headers, other non-event data, and data with parsing errors in it. This number should always be larger than total_bytes_indexed and may be significantly larger if there are parsing errors in json data that would stop it from being indexed.
If you have multiple requests show up in this data, you should be aware of how large each request is on average. Larger requests with many events batched mean a single indexer has to process them which isn't normally an issue. If a request is 100MB or 500MB, that's a lot of data for one indexer to swallow in one shot, and may cause indexing delays for other data sources trying to ingest to that indexer. To figure out the average size of a request= total_bytes_received / num_of_requests. In this example, that's 444/4=111 bytes per request. This can become important if data is being sent to a Splunk Cloud HEC listener, as the service details specify 1MB as a max size (https://docs.splunk.com/Documentation/SplunkCloud/8.0.2003/Service/SplunkCloudservice#Service_limits...).
num_of_parser_errors indicates that malformed JSON data was encountered. There will be corresponding errors logged in _internal splunkd.log

View solution in original post

davidpaper · ‎05-14-2020

There a few things to unpack here.

HEC data is reported in introspection every 1 minute. All discussions below assume 1 min intervals unless otherwise stated.
There are two types of introspection logs for HEC, one type that summarizes all HEC activity on the host, and one type that provides a summary for each unique token received. What's the different?

The first type example is below. Note that it does not have a token_name: field in it.

{ [-]
   component: HttpEventCollector
   data: { [-]
     format: json
     num_of_ack_requests: 0
     num_of_auth_failures: 0
     num_of_errors: 0
     num_of_events: 3
     num_of_parser_errors: 0
     num_of_requests: 1
     num_of_requests_acked: 0
     num_of_requests_in_mint_format: 0
     num_of_requests_to_disabled_token: 0
     num_of_requests_to_incorrect_url: 0
     num_of_requests_waiting_ack: 0
     series: http_event_collector
     total_bytes_indexed: 72
     total_bytes_received: 111
     transport: http
   }
   datetime: 05-11-2020 10:52:15.826 -0400
   log_level: INFO
}

The second type does have a token_name in it.

{ [-]
   component: HttpEventCollector
   data: { [-]
     format: json
     num_of_errors: 0
     num_of_events: 12
     num_of_parser_errors: 0
     num_of_requests: 4
     num_of_requests_in_mint_format: 0
     num_of_requests_to_disabled_token: 0
     series: http_event_collector_token
     token_name: testing multiple events
     total_bytes_indexed: 288
     total_bytes_received: 444
     transport: http
   }
   datetime: 05-11-2020 11:02:15.775 -0400
   log_level: INFO

And my token name now shows up. Other than the token_name, there are no other differences. Each token seen by the indexer will generate its own unique introspection log entry. So, if there are 10 unique tokens sent to the indexer, expect to see 11 introspection events (1 for each token + 1 summary). If you are summing up HEC usage data, be careful not to count the same data more than once.

format is always json
A HEC request may have or more Splunk events in it. A multi-event request is called a batch. num_of_events is a sum of all Splunk events received by the indexer.
num_of_requests is how many individual requests HEC requests the indexer received.
total_bytes_indexed is how much data was ingested via HEC
total_bytes_received is how much data was received by HEC, including headers, other non-event data, and data with parsing errors in it. This number should always be larger than total_bytes_indexed and may be significantly larger if there are parsing errors in json data that would stop it from being indexed.
If you have multiple requests show up in this data, you should be aware of how large each request is on average. Larger requests with many events batched mean a single indexer has to process them which isn't normally an issue. If a request is 100MB or 500MB, that's a lot of data for one indexer to swallow in one shot, and may cause indexing delays for other data sources trying to ingest to that indexer. To figure out the average size of a request= total_bytes_received / num_of_requests. In this example, that's 444/4=111 bytes per request. This can become important if data is being sent to a Splunk Cloud HEC listener, as the service details specify 1MB as a max size (https://docs.splunk.com/Documentation/SplunkCloud/8.0.2003/Service/SplunkCloudservice#Service_limits...).
num_of_parser_errors indicates that malformed JSON data was encountered. There will be corresponding errors logged in _internal splunkd.log

joshd · ‎05-18-2020

@davidpaper A couple of additional notes I had from working with support 1-2 years ago on the limitations of _introspection logs for HEC:

The way to differentiate the _introspection events is to filter on data.series="http_event_collector" for the rolled up summary of all tokens or data.series="http_event_collector_token" for the per-token summaries. This is useful for making sure you're doing your math against the proper numbers as you elude to above 🙂
At times you may have num_of_requests=0 however total_bytes_received>0. It was explained to me that.. "this channel writes every minute, so if the channel gets the data but has not processed it just yet, you can generally get the details in the next event from the same token/indexer"
Multiple requests can make up a single _introspection log so take the math of total_bytes_received / num_of_requests with a grain of salt. If you have multiple clients all using the same token then you could have situations where one client is making big requests while another in making small, thus skewing this average.
There is no way to more granularly understand the relationship of requests to the introspection logs. Therefore say if you have num_of_requests=3 and the next _introspection event from the same indexer shows num_of_requests=2 there is no way of knowing if 1 request was processed between that time frame or say 1 new request came in and two requests were processed. It makes tracking the status of requests within the _introspection log extremely painful, also when you need to track back issues to the client side all you have to go off of is trying to correlate based on timing.

davidpaper · ‎05-18-2020

Good points @joshd . For #3, this can definitely be a problem, as averages hide the best and the worst of that value quote nicely. This is another good reason to avoid re-using the same HEC token for more than 1 data source or sourcetype. Ingestion metrics aren't the only thing that can cause problems when reusing HEC tokens. Error detection in the ingestion pipeline (think data quality view in MC) only get as granular as the HEC token for some things, so reusing HEC tokens makes those more difficult to track down.

What is included in HEC introspection data?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life