Splunk Search

How to edit my search for proper visual statistical analysis of problem severity for a web service?

kaurinko
Communicator

Hi,

I am trying to analyze the problem severity of a web service by weighting the failure fraction of cases by the load i.e. the total number of hits during the analysis period. By this I would like to get rid of false positive alerts during night time when the load is considerably lower.

My baseline search is:

index=myidex | stats count(eval(STATUS_CODE=2)) AS OK count(eval(STATUS_CODE != 2)) AS NOK by _time | timechart span=60m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

This gives me two curves mirrored by the straight line at y=0.5, which is OK. What is NOT OK is the straight line at y=2.0 labelled as Factor. It is supposed to give me the number of hits during the timeinterval, but instead it gives me a constant 2.0 indicating that it is evaluated separately for each and every log file line.

The "Factor" here is a crude example showing that for some reason I fail to manipulate the statistics. I tried things like

avg(eval(NOK/max(100,(NOK+OK)))) AS "Failure Fraction"

simply to receive a lower number for times when the actual load is lower.

Could you help me out here, why can't I do the math where I want to do it? I guess it is something fundamentally simple I just can't see.

Any help is appreciated!

0 Karma
1 Solution

baerts
Path Finder

I tried:

index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin the stats in the same time interval as the time chart (5 minutes)

View solution in original post

baerts
Path Finder

I tried:

index=_internal |bin _time span=5m |stats count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK by _time|timechart span=5m avg(eval(NOK/(NOK+OK))) AS "Failure Fraction" avg(eval(OK/(NOK+OK))) AS "Success Fraction" avg(eval(max(2,NOK+OK))) AS Factor

and this appears to work (although the number factor completely blows the other variables out of the time chart) . I needed to bin the stats in the same time interval as the time chart (5 minutes)

kaurinko
Communicator

Hi!

Thanks a lot for your help. Clearly bin did the trick.

0 Karma

baerts
Path Finder

Btw when i use:

 index=_internal |timechart count(eval(log_level="ERROR")) AS NOK count(eval(log_level!="ERROR")) AS OK|eval Nodata=if(NOK+OK=0,1,0) 

and choose stacked area representation, I get a nice 100% stacked graph that has an additional "Nodata" variable when I receive no hits on OK or NOK

Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...