Splunk IT Service Intelligence

Splunk IT Service Intelligence: How are service dependencies calculated into the health score of the service?

sdcalmes
Explorer

When reading this section of the Splunk IT Service Intelligence (ITSI) documentation: http://docs.splunk.com/Documentation/ITSI/2.4.1/Configure/HowtocreateKPIsearches#How_service_health_...

There is a sentence that just cuts off and doesn't finish ("Configure ITSI Services" -> "Step 7" -> "How service health scores work")

"Service health score values are impacted by dependent service health score and KPI values. For more information, see"
I'm currently creating a lot of services that depend on each other, and I have NO idea how the service dependencies are calculated into the health score of the service. Because my service has a lot of dependencies and KPIs, each thing only affects it very little, which is not what I want.

So I guess:

  1. How are the service dependencies taken into account/weighted?

  2. With this many KPIs/dependencies, even when it is critical and level 10, my KPI doesn't bring my health score down low enough (and I don't want to use level 11). Has anyone else ran into an issue like this?

Thanks

1 Solution

thejeffreystone
Path Finder

The service dependencies are weighted by that slider for 0-10 and I believe it is just almost like a percentage of total the total. So if you have 5 KPIs and wight them all as 10 each one will not move the dial much. From what I have found its hard to get that Health score dialed in.

If you have a critical KPI or one that 1 to 1 impacts the over health though you should really consider making it an 11.

But I agree, its hard to get the weighting dialed in. Event if you have them all spread out on the weighting each one effects the over health score very little. You would need multiple KPIs going bad to really make the health score dip.

What I have done to get the health score dialed in is combine KPIs. So for example if I have 6, I group 3 into a critical KPI that gets weighted high, and the other 3 into another group that gets a low weight. Then the health score is a bit closer to what I want it to show. That way if anyone of my critical KPIs tripped the health score is reflected where as if they were individual all three would have to get tripped.

View solution in original post

aaraneta_splunk
Splunk Employee
Splunk Employee

@sdcalmes - Did one of the answers below help provide a solution your original question? If yes, please click “Accept” below the best answer to resolve this post and upvote anything that was helpful. If no, please leave a comment with more feedback. Thanks.

0 Karma

BMacher
Path Finder

Hello

By testing with 2 and 3 KPI´s and different severities and importances i have gotten the following formula:
alt text
N = count of KPI´s; G = importance of one KPI; K = the severity of the KPI (normal=100, low=70, medium=50, high=30, critical=0)
(I have gotten the factors of the different severities by testing)

Here an example:
alt text
Service Health Score = (100 ∗ 10/22)+(70 ∗ 7/22)+(30 ∗ 5/22) = 45,45 + 22,27 + 6,81 = 74,53

So far so good, but I do not know how service dependencies are considered in this calculation, since you can not adjust their importances directly. My idea is, that the KIP´s of the other services are handled like it would be the own ones. Does anyone has an idea?

Regards
Benjamin

thejeffreystone
Path Finder

The service dependencies are weighted by that slider for 0-10 and I believe it is just almost like a percentage of total the total. So if you have 5 KPIs and wight them all as 10 each one will not move the dial much. From what I have found its hard to get that Health score dialed in.

If you have a critical KPI or one that 1 to 1 impacts the over health though you should really consider making it an 11.

But I agree, its hard to get the weighting dialed in. Event if you have them all spread out on the weighting each one effects the over health score very little. You would need multiple KPIs going bad to really make the health score dip.

What I have done to get the health score dialed in is combine KPIs. So for example if I have 6, I group 3 into a critical KPI that gets weighted high, and the other 3 into another group that gets a low weight. Then the health score is a bit closer to what I want it to show. That way if anyone of my critical KPIs tripped the health score is reflected where as if they were individual all three would have to get tripped.

sdcalmes
Explorer

Maybe I'm confused about the service dependencies...as I only have sliders for the actual KPIs in that service, not the dependencies.

Either way, I like your idea of grouping, I was thinking about that before but didn't go down that path. Is there a straightforward way of grouping them together, or will I have to do it manually?

I.E, if i have disk space, memory, and CPU usage as separate KPIs, is there a super simple way to group them that I don't know about, or will I have to make my base search include all of them and do the calculations?

Thanks a lot for your quick replay.

thejeffreystone
Path Finder

I think kind of glossed over the dependencies / KPIs. You are right, you have no sliders for a service that is a dependency of another service. A dependency in that case is considered a minimum health score so it is treated as if it is weighted at 11.

0 Karma

sdcalmes
Explorer

Yep I figured that out...Kind of annoying how that works. It would be nice to have more power over that/have it calculated differently.

Thanks.

0 Karma

thejeffreystone
Path Finder

I grouped mine based on how I thought they would affect the service. The ones that directly impacted service got grouped together and the ones that indirectly or where the impact was conditional based on other factors got grouped together. My use case was audio and video stream metrics so it was a little different.

I had a base search with 7 KPI metrics in it. And what I did was created two new metrics and added them to the base search that determine if the entity is in either a critical state or a warning state. For example, the critical metric looks at three of my KPIs that I consider to have a major impact to my service and sets the critical KPI metric to 1 if any of those KPIs is "failing." I sum those up based on the service. So if all the entities in my service are failing then the sum would be 9.

Because I didn't want to lose the visibility of the other KPIs I left the other 7 KPIs in my base search. Then in my service I weighted the two new KPIs (critical and warning) normally (10 and 5) and set the other 7 to 1 so they were just informational. So now my health score is determined by the two new metrics. And when I see the service is degraded I click on it in service analyzer I see immediately what my problem KPIs are because they are included in the service.

0 Karma

sdcalmes
Explorer

Thanks for the great answers.

I have one more question. Is there a good way ( that you know of) to use the health score of another service as a dependency for another? Here's my situation:

  • Service 1
    KPI 1
    KPI 2
    Dependency 1 (Service 2 Health Score)

  • Service 2
    KPI 3
    KPI 4
    etc.

Ex: Service 1 KPI 1 and KPI 2 are both normal. But Service 2's health score is 90. Service 1 automatically defaults to 90. If KPI 1 or KPI 2 change, it doesn't change the health score for Service 1. Is there anyway around this issue? Essentially, if Service 2's Health score is 90, and KPI 1 "low", then Service 1's health score should be less than 90.

I hope that makes sense.

0 Karma

thejeffreystone
Path Finder

Sorry for the late response. That is one I haven't quite solved myself. I have issue with how the services roll up, so if I have a service dependent on other services I typically just write a new base search.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...