I now work on ITSI service models and health. I want my service models to be lightweight and elegant, but support drill-down to the exact problematic component and most importantly reflect the real service state from user perspective. I have done many experiments, but still cannot implement some ITSI services and KPIs satisfying my requirements.
For example, I want to build a 3-node cluster service, containing
host3. I have a KPI query to get each host UP and DOWN state.
I want to achieve the following:
1 My cluster service KPI value does not decrease at all until at least two of three nodes go to DOWN state. The service ServiceHealthScore decrease if two of three nodes go to DOWN state:
If only any one node goes to DOWN state, cluster service ServiceHealthScore KPI should remain Normal (Green color) with value 100. The broken host itself ServiceHealthScore should be High (Orange color).
If two nodes go to DOWN state, cluster service ServiceHealthScore KPI should be Low (Yellow color). The broken hosts - High (Orange color).
If all three nodes go to DOWN state, cluster service ServiceHealthScore KPI should be High (Orange color). All three broken hosts - High (Orange color).
2 Visualize the state of each host in cluster in ITSI Service Analyzer.
3 Have a possibility to alert state change of each individual host and whole cluster by ITSI Notable Event Aggregation Policies.
Below is what I managed to achieve.
Test 1: A very simple service model, one KPI counting number of alive nodes. I know, how many hosts in cluster are UP, I can create events on service state change. Neither I cannot show individual hosts state (poor drill-down), nor alert host state change. Administrator must find affected nodes beyond the Service Analyzer. This is unacceptable.
Service health is calculated as I want it:
Test 2: Same simple service model, but KPI has split by entity setting. I know how many hosts in cluster are UP, I know which hosts are UP and DOWN.
Unfortunately, ServiceHealthScore does not work as it should for the cluster:
Test 3: This is what I like the most in Service Analyzer. UP/DOWN KPI are linked to individual services representing hosts, cluster state is calculated automatically.
I could not find any way to force ServiceHealthScore KPI to behave like 'cluster' functionality:
Actually, I want to implement my KPIs with Test 3 service model, which I personally like the most. Can anyone help me deal with ServiceHealthScore? Alternatively, maybe any workaround?
Not sure what is doable.
but if you are going the service-dependencies road,
For services dependencies, the upper service can only look at the last minute sub-services scores, (the current ones are not yet calculated), so there is a cascade effect, and a minute delay to see the updates.