Analyzing HEC response times on idle

afx · ‎02-10-2020

Hi,
thanks to the wonderful website_monitoring app, I see some interesting but unexplained tidbits.
We have two indexers with HEC configurued. Because of project delays those HEC inputs are idle.
I use
https://splunk-index1:8088/services/collector/health
for the query in website_monitoring.
And at least onece a day I do get a 5 second response time on one of the indexers, not the other. Usually this is less than 20ms.
Checking _index/_audit for anything happening in parallel, I found nothing so far that would explain this monster increase.
It is not linked to specific times.
If I only use the port, the peak times are just up t0 60ms worst case. But that gives me an ugly 404 error, so I figured I might as well use a decent endpoint.

Any ideas?

thx
afx

nickhills · ‎02-10-2020

Not a direct answer to your question, however:

Its best practice NOT to run HEC on indexers.
Ideally you would install HeavyForwarders and run the HEC collection endpoints from there.

Whilst it does not directly answer your question, it would mitigate the impact of a slow responding indexer (if indeed that is the problem) by separating the realtime collection(HEC) response times from the ingestion lag (indexers)

If my comment helps, please give it a thumbs up!

afx · ‎02-20-2020

Currently our Infrastructure is small, so I try to not involve yet another box.
The funny thing is, the machine is pretty much idle when this happens.

cheers
afx

Analyzing HEC response times on idle

CX Day is Coming!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Are you a member of the Splunk Community?

Analyzing HEC response times on idle

CX Day is Coming!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console