Hi,
I am trying to analyze the problem severity of a web service by weighting the failure fraction of cases by the load i.e. the total number of hits during the analysis period. By this I would like to get rid of false positive alerts during night time when the load is considerably lower.
My baseline search is:
index=myidex | stats count(eval(STATUS_CODE=2)) AS OK count(eval(STATUS_CODE != 2)) AS NOK by _time | timechart span=60m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor
This gives me two curves mirrored by the straight line at y=0.5, which is OK. What is NOT OK is the straight line at y=2.0 labelled as Factor. It is supposed to give me the number of hits during the timeinterval, but instead it gives me a constant 2.0 indicating that it is evaluated separately for each and every log file line.
The "Factor" here is a crude example showing that for some reason I fail to manipulate the statistics. I tried things like
avg(eval(NOK/max(100,(NOK+OK)))) AS "Failure Fraction"
simply to receive a lower number for times when the actual load is lower.
Could you help me out here, why can't I do the math where I want to do it? I guess it is something fundamentally simple I just can't see.
Any help is appreciated!
I tried:
index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor
and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin
the stats in the same time interval as the time chart (5 minutes)
I tried:
index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor
and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin
the stats in the same time interval as the time chart (5 minutes)
Hi!
Thanks a lot for your help. Clearly bin
did the trick.
Btw when i use:
index=_internal |timechart count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK|eval Nodata=if(NOK+OK=0,1,0)
and choose stacked area representation, I get a nice 100% stacked graph that has an additional "Nodata" variable when I receive no hits on OK or NOK