Splunk ITSI

ITSI - Service Health Score not related to underlying KPI

sail4lot
Path Finder

Hi. I am having trouble fully grasping the Service Health Score aspect of a service.

My service has 1 KPI. That KPI is taking an average of a value over the past 60 minutes (using earliest=-60m since there is no last hour option in calculation window). The value is consistently way above the threshold (being NORMAL). The service health score shows 50 "now" when the KPI shows 100/Normal.

Is there a 'recovery' time for the service health score? I've looked at it from all perspectives (max, avg, min) and all the indications in the underlying KPI are that the Service Health Score should be 100, yet it stays at 50 from a previous Critical reading for what appears to be an arbitrary period of time.

Note: this was a recently introduced service so perhaps it has something to do with it not running a full 24 hours yet?

Any ideas?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Are you sure the KPI your looking at is linked to that Service? You should go to the Services tab and pop open the service to see what kpi's are attached to it. You may also have additional services in there. Are you using adaptive or static thresholding?

0 Karma

sail4lot
Path Finder

Thanks for the response!

I am definitely certain that there is only one KPI in the service and I'm using static thresholding.

I took your advice from a previous post and put the values I was looking to monitor on a glass table. That seemed to more accurately reflect the values I expected when the Service Analyzer didn't. Is it possible that the Service Analyzer has some lag in there or that it somehow takes the average of previous Service Health Scores and displays the average (for example, for a 12 hour period, was 0 for 6 hours, turned 100 at 6 hours, now the value reflects 50)?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

If I recall correctly, the service analyzer is computing values based off the raw data and not the itsi_summary index. The data in the itsi_summary index will always lag behind the raw data because the summary index doesn't populate instantaneously like the raw data does. So you may have different values depending on how frequently your KPI's they update and their look back period.

How far off is the value from the service analyzer view? Have you attempted to duplicate the service and KPI to test if you were getting differing values?

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...