Hi,
here is a query that is supposed to calculate a % of failed operations over a period of time (A message 'end' is sent with a status that could be 'failed'). Please excuse incorrect or non technical terminology, I'm a very new to this. I am trying to make sure I understand the meaning of bin and span in this particular search. Does this mean that I'm putting all of my events into chunks by 1 hour (so all events from 11am until noon are in one bucket, all events from noon to 1pm are in the next bucket, etc). Then I calculate the total number of events per each bucket (count as complete), calculate the total number of events per each bucket where status=failed (eval(status="failed")). Then for the timechart command, I add up all these totals from each bucket over 1 day and calculate my percentage. Is that a correct understanding? Thank you!
For example, if my data is like this:
event 1:
timestamp: June 11, 2018 9am
message: end
status: success
event 2:
timestamp: June 11, 2018 9:15am
message: end
status: failed
event 3:
timestamp: June 11, 2018 10am
message: end
status: success
event 3:
timestamp: June 11, 2018 10:15am
message: end
status: success
Then my failure rate % is (1+0)/(2+2)*100 = 25%,
index="index" "message=end"
| bin span=1h _time
| stats count as complete,
count(eval(status="failed")) as failed by _time
| timechart span=1d eval(100*sum(eval(failed))/sum(eval(complete))) as "Failed %"
... View more