Hello guys!
I now work on ITSI service models and health. I want my service models to be lightweight and elegant, but support drill-down to the exact problematic component and most importantly reflect the real service state from user perspective. I have done many experiments, but still cannot implement some ITSI services and KPIs satisfying my requirements.
For example, I want to build a 3-node cluster service, containing host1 , host2 and host3 . I have a KPI query to get each host UP and DOWN state.
I want to achieve the following:
1 My cluster service KPI value does not decrease at all until at least two of three nodes go to DOWN state. The service ServiceHealthScore decrease if two of three nodes go to DOWN state:
If only any one node goes to DOWN state, cluster service ServiceHealthScore KPI should remain Normal (Green color) with value 100. The broken host itself ServiceHealthScore should be High (Orange color).
If two nodes go to DOWN state, cluster service ServiceHealthScore KPI should be Low (Yellow color). The broken hosts - High (Orange color).
If all three nodes go to DOWN state, cluster service ServiceHealthScore KPI should be High (Orange color). All three broken hosts - High (Orange color).
2 Visualize the state of each host in cluster in ITSI Service Analyzer.
3 Have a possibility to alert state change of each individual host and whole cluster by ITSI Notable Event Aggregation Policies.
Below is what I managed to achieve.
Test 1: A very simple service model, one KPI counting number of alive nodes. I know, how many hosts in cluster are UP, I can create events on service state change. Neither I cannot show individual hosts state (poor drill-down), nor alert host state change. Administrator must find affected nodes beyond the Service Analyzer. This is unacceptable.
Service health is calculated as I want it:
If one host DOWN - service has Normal severity, correct:
If two hosts DOWN - service has Low severity, correct:
If all three hosts DOWN - service has High severity, correct:
Test 2: Same simple service model, but KPI has split by entity setting. I know how many hosts in cluster are UP, I know which hosts are UP and DOWN.
Unfortunately, ServiceHealthScore does not work as it should for the cluster:
If one host DOWN - service has High severity, wrong! (despite aggregate KPI is Normal, correct):
If two hosts DOWN - service has High severity, wrong! (despite aggregate KPI is Low, correct):
If all three hosts DOWN - service has High severity, correct:
Test 3: This is what I like the most in Service Analyzer. UP/DOWN KPI are linked to individual services representing hosts, cluster state is calculated automatically.
I could not find any way to force ServiceHealthScore KPI to behave like 'cluster' functionality:
If one host DOWN - service has High severity, wrong:
If two hosts DOWN - service has High severity, wrong:
If all three hosts DOWN - service has High severity, correct:
Actually, I want to implement my KPIs with Test 3 service model, which I personally like the most. Can anyone help me deal with ServiceHealthScore? Alternatively, maybe any workaround?
... View more