Splunk Search

How to edit my search for proper visual statistical analysis of problem severity for a web service?

Path Finder

Hi,

I am trying to analyze the problem severity of a web service by weighting the failure fraction of cases by the load i.e. the total number of hits during the analysis period. By this I would like to get rid of false positive alerts during night time when the load is considerably lower.

My baseline search is:

index=myidex | stats count(eval(STATUS_CODE=2)) AS OK count(eval(STATUS_CODE != 2)) AS NOK by _time | timechart span=60m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

This gives me two curves mirrored by the straight line at y=0.5, which is OK. What is NOT OK is the straight line at y=2.0 labelled as Factor. It is supposed to give me the number of hits during the timeinterval, but instead it gives me a constant 2.0 indicating that it is evaluated separately for each and every log file line.

The "Factor" here is a crude example showing that for some reason I fail to manipulate the statistics. I tried things like

avg(eval(NOK/max(100,(NOK+OK)))) AS "Failure Fraction"

simply to receive a lower number for times when the actual load is lower.

Could you help me out here, why can't I do the math where I want to do it? I guess it is something fundamentally simple I just can't see.

Any help is appreciated!

0 Karma
1 Solution

Path Finder

I tried:

index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin the stats in the same time interval as the time chart (5 minutes)

View solution in original post

Path Finder

I tried:

index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin the stats in the same time interval as the time chart (5 minutes)

View solution in original post

Path Finder

Hi!

Thanks a lot for your help. Clearly bin did the trick.

0 Karma

Path Finder

Btw when i use:

 index=_internal |timechart count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK|eval Nodata=if(NOK+OK=0,1,0) 

and choose stacked area representation, I get a nice 100% stacked graph that has an additional "Nodata" variable when I receive no hits on OK or NOK