Splunk IT Service Intelligence
Highlighted

Issue with Service Health Score in ITSI

Path Finder

We are experiencing issues with services' health score alternating between 0 and 100 in the Service Analyzer in ITSI.
The health scores shows 0 even though all the underlying KPIs are ok. This happens for all of our defined services. The simplest case is shown below. Here we have a service "Azure Status" with only one defined KPI: "AzureStatus".

alt text

We recently updated to 3.0.0, but experienced the same issue before the upgrade (version 2.4.0).

Anyone ideas what would cause this or what the issue is?

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

SplunkTrust
SplunkTrust

Is that KPI running a base search or adhoc search?

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

Path Finder

The KPI is running an adhoc search.

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

SplunkTrust
SplunkTrust

Are you running on a single heard head or in a cluster?

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

Path Finder

Single search head.

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

SplunkTrust
SplunkTrust

Can you move your "Azure Status" service to a glass table icon and see if your still getting zero? This will tell us if its a Service Analyzer or ITSI issue

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

Path Finder

It looks to be alternating. The KPI's value is constant, but the health is switching from 100 to 0 at random intervals.

What's interesting, I tried adding some of the other services health scores to the same glass table, and all the scores are alternating between 0 and 100 at the exact same time. And there are no defined dependencies between them.

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

SplunkTrust
SplunkTrust

Can you share your adhoc search?

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

Path Finder

Yes! Search:
index=azure host=azurerss sourcetype=azurestatus
| eval value=if(StatusMessage="An issue has been discovered",0,1)

Threshold field: value
Split by entity: No
Calculating Average of aggregate over the last 15 minute(s) every 5 minutes.

0 Karma
Highlighted

Re: Issue with Service Health Score in ITSI

SplunkTrust
SplunkTrust

I see the issue.. You are returning a value of 0 if the condition is true and returning a value of 1 if the condition is false. When ITSI is averaging the two values, it will never work out correctly.

A better approach would be to not average the results but rather sum them over the 5 minute span and if the count goes over a specified threshold, it can change the color of the KPI.

If you take this approach then your eval should look like this
| eval value=if(StatusMessage="An issue has been discovered",1,0)

0 Karma