I have created a "Status over time" Multi-KPI alert and selected the ServiceHealthScore to configure a trigger. The trigger fires if the ServiceHealthScore is critical for more than 50% of the time. My correlation search runs every 15 minutes over the previous 15' period.
As ServiceHealtScores are calculated every 1', I have 15 samples in my correlation search results. As soon as 8 or more of these samples find the ServiceHealthScore in a critical state, a notable event should be generated and an episode will be started or appended to. What happens instead is that a notable event is fired, as soon as 1 or more samples return 'Critical' within the 15' period ?
I have analyzed the correlation search and found what I believe to be an error in a bugfix on an earlier error ... I found an earlier post by someone who also found this error and reported it to Splunk, I could not find it in the list of fixed issues of later ITSI releases though ...
In ITSI 4.1.2 the correlation search read : ..... stats count as occurances ...... and : ... 'getPercentage(alert_period, occurence), which obviously did not work as the occurence field did not correspond to the calculated occurances field.
In ITSI 4.3.0 the correlation search was corrected and reads : .... stats count as occurences ... and : .... 'getPercentage(alert_period, occurence), which is close, but still no cigar .... as occurences is plural and occurence is not.
So far for testing I suppose, as the correlation search is still not working and returns 100% regardless of the number of occurences satisfying the specified 50% Critical trigger condition....
Fixing it is simple enough (How do I call this fix on a bugfix now ? a bigfix ?) :
.... stats count as occurence .... and then .... 'getPercentage(alert_period, occurence) ... calculates the percentage correctly and notable events are only generated when the trigger condition is really met. To implement this bigfix, you need to edit the search in the correlation search editor and once you do that you cannot use the Multi-KPI Alert editor anymore on that correlation search. This is a one way street guys, there is no way back until Splunk fixes the search generator.
Hopefully this helps other Splunkers to save some troubleshooting time and inspires Splunk to bigfix their bugfix 🙂
This issue was fixed in the 4.3.1 maintenance release: https://docs.splunk.com/Documentation/ITSI/4.3.1/ReleaseNotes/Fixedissues#Uncategorized_issues