I'm ingesting data via HEC and I know there is data about it in _introspection, but I don't know what I'm looking at when I search for it. Here is what I know so far.
I have a HEC token named testing multiple events
.
The token itself looks like f2584364-976f-4a68-ac3b-4a4d481ec8cd
.
I'm searching for introspection data about HEC via index=_introspection sourcetype="http_event_collector_metrics"
.
Some data has been sent to it for testing.
Can someone explain what I'm seeing when looking at the entry below?
{ [-]
component: HttpEventCollector
data: { [-]
format: json
num_of_errors: 0
num_of_events: 3
num_of_parser_errors: 0
num_of_requests: 1
num_of_requests_in_mint_format: 0
num_of_requests_to_disabled_token: 0
series: http_event_collector_token
token_name: testing multiple events
total_bytes_indexed: 72
total_bytes_received: 111
transport: http
}
datetime: 05-11-2020 10:52:15.827 -0400
log_level: INFO
}
There a few things to unpack here.
HEC data is reported in introspection every 1 minute. All discussions below assume 1 min intervals unless otherwise stated.
There are two types of introspection logs for HEC, one type that summarizes all HEC activity on the host, and one type that provides a summary for each unique token received. What's the different?
The first type example is below. Note that it does not have a token_name:
field in it.
{ [-]
component: HttpEventCollector
data: { [-]
format: json
num_of_ack_requests: 0
num_of_auth_failures: 0
num_of_errors: 0
num_of_events: 3
num_of_parser_errors: 0
num_of_requests: 1
num_of_requests_acked: 0
num_of_requests_in_mint_format: 0
num_of_requests_to_disabled_token: 0
num_of_requests_to_incorrect_url: 0
num_of_requests_waiting_ack: 0
series: http_event_collector
total_bytes_indexed: 72
total_bytes_received: 111
transport: http
}
datetime: 05-11-2020 10:52:15.826 -0400
log_level: INFO
}
The second type does have a token_name
in it.
{ [-]
component: HttpEventCollector
data: { [-]
format: json
num_of_errors: 0
num_of_events: 12
num_of_parser_errors: 0
num_of_requests: 4
num_of_requests_in_mint_format: 0
num_of_requests_to_disabled_token: 0
series: http_event_collector_token
token_name: testing multiple events
total_bytes_indexed: 288
total_bytes_received: 444
transport: http
}
datetime: 05-11-2020 11:02:15.775 -0400
log_level: INFO
And my token name now shows up. Other than the token_name
, there are no other differences. Each token seen by the indexer will generate its own unique introspection log entry. So, if there are 10 unique tokens sent to the indexer, expect to see 11 introspection events (1 for each token + 1 summary). If you are summing up HEC usage data, be careful not to count the same data more than once.
format
is always jsonnum_of_events
is a sum of all Splunk events received by the indexer.num_of_requests
is how many individual requests HEC requests the indexer received.total_bytes_indexed
is how much data was ingested via HECtotal_bytes_received
is how much data was received by HEC, including headers, other non-event data, and data with parsing errors in it. This number should always be larger than total_bytes_indexed
and may be significantly larger if there are parsing errors in json data that would stop it from being indexed.average size of a request
= total_bytes_received
/ num_of_requests
. In this example, that's 444/4=111 bytes per request. This can become important if data is being sent to a Splunk Cloud HEC listener, as the service details specify 1MB as a max size (https://docs.splunk.com/Documentation/SplunkCloud/8.0.2003/Service/SplunkCloudservice#Service_limits...).num_of_parser_errors
indicates that malformed JSON data was encountered. There will be corresponding errors logged in _internal splunkd.logThere a few things to unpack here.
HEC data is reported in introspection every 1 minute. All discussions below assume 1 min intervals unless otherwise stated.
There are two types of introspection logs for HEC, one type that summarizes all HEC activity on the host, and one type that provides a summary for each unique token received. What's the different?
The first type example is below. Note that it does not have a token_name:
field in it.
{ [-]
component: HttpEventCollector
data: { [-]
format: json
num_of_ack_requests: 0
num_of_auth_failures: 0
num_of_errors: 0
num_of_events: 3
num_of_parser_errors: 0
num_of_requests: 1
num_of_requests_acked: 0
num_of_requests_in_mint_format: 0
num_of_requests_to_disabled_token: 0
num_of_requests_to_incorrect_url: 0
num_of_requests_waiting_ack: 0
series: http_event_collector
total_bytes_indexed: 72
total_bytes_received: 111
transport: http
}
datetime: 05-11-2020 10:52:15.826 -0400
log_level: INFO
}
The second type does have a token_name
in it.
{ [-]
component: HttpEventCollector
data: { [-]
format: json
num_of_errors: 0
num_of_events: 12
num_of_parser_errors: 0
num_of_requests: 4
num_of_requests_in_mint_format: 0
num_of_requests_to_disabled_token: 0
series: http_event_collector_token
token_name: testing multiple events
total_bytes_indexed: 288
total_bytes_received: 444
transport: http
}
datetime: 05-11-2020 11:02:15.775 -0400
log_level: INFO
And my token name now shows up. Other than the token_name
, there are no other differences. Each token seen by the indexer will generate its own unique introspection log entry. So, if there are 10 unique tokens sent to the indexer, expect to see 11 introspection events (1 for each token + 1 summary). If you are summing up HEC usage data, be careful not to count the same data more than once.
format
is always jsonnum_of_events
is a sum of all Splunk events received by the indexer.num_of_requests
is how many individual requests HEC requests the indexer received.total_bytes_indexed
is how much data was ingested via HECtotal_bytes_received
is how much data was received by HEC, including headers, other non-event data, and data with parsing errors in it. This number should always be larger than total_bytes_indexed
and may be significantly larger if there are parsing errors in json data that would stop it from being indexed.average size of a request
= total_bytes_received
/ num_of_requests
. In this example, that's 444/4=111 bytes per request. This can become important if data is being sent to a Splunk Cloud HEC listener, as the service details specify 1MB as a max size (https://docs.splunk.com/Documentation/SplunkCloud/8.0.2003/Service/SplunkCloudservice#Service_limits...).num_of_parser_errors
indicates that malformed JSON data was encountered. There will be corresponding errors logged in _internal splunkd.log@davidpaper A couple of additional notes I had from working with support 1-2 years ago on the limitations of _introspection logs for HEC:
The way to differentiate the _introspection events is to filter on data.series="http_event_collector" for the rolled up summary of all tokens or data.series="http_event_collector_token" for the per-token summaries. This is useful for making sure you're doing your math against the proper numbers as you elude to above 🙂
At times you may have num_of_requests=0 however total_bytes_received>0. It was explained to me that.. "this channel writes every minute, so if the channel gets the data but has not processed it just yet, you can generally get the details in the next event from the same token/indexer"
Multiple requests can make up a single _introspection log so take the math of total_bytes_received
/ num_of_requests
with a grain of salt. If you have multiple clients all using the same token then you could have situations where one client is making big requests while another in making small, thus skewing this average.
There is no way to more granularly understand the relationship of requests to the introspection logs. Therefore say if you have num_of_requests=3 and the next _introspection event from the same indexer shows num_of_requests=2 there is no way of knowing if 1 request was processed between that time frame or say 1 new request came in and two requests were processed. It makes tracking the status of requests within the _introspection log extremely painful, also when you need to track back issues to the client side all you have to go off of is trying to correlate based on timing.
Good points @joshd . For #3, this can definitely be a problem, as averages hide the best and the worst of that value quote nicely. This is another good reason to avoid re-using the same HEC token for more than 1 data source or sourcetype. Ingestion metrics aren't the only thing that can cause problems when reusing HEC tokens. Error detection in the ingestion pipeline (think data quality view in MC) only get as granular as the HEC token for some things, so reusing HEC tokens makes those more difficult to track down.