Getting Data In

How to get a specific result with stats latest when two events have the same timestamp?

russell120
Communicator

Hi, please consider this example.csv:

device   result    _time
apollo1  passed  2019-06-28
apollo1  failed  2019-06-28
zeus8    passed  2019-06-23
zeus8    failed  2019-06-23
zeus8    error   2019-06-23
hermes3  failed  2019-06-23 //Edit #1
hermes3  failed  2019-06-23 //Edit #1
apollo1  passed  2019-06-20  //Edit #1

Note: This is just an example. My real CSV has thousands of devices.

I want to return the latest entry based on the device value. When a device appears more than once with a duplicated timestamp and different result value, the event where result = passed should appear. Else, show the event where result is failed or error. What query do I need to do that?

My current query ( |inputlookup example.csv |stats latest by device) does not guarantee the event where result = passed will appear when a device shows up multiple times with with the same latest timestamp and different result values.

//Edit #1 below
The query I need would produce the table below:

device   result    _time
apollo1  passed  2019-06-28
zeus8    passed  2019-06-23
hermes3  failed  2019-06-23

The query would 1) show only the latest device with the latest _time, 2) if a device is listed twice with the same _time and a different result, show just the device where result = passed. Else display the event with either failed or error, and 3) since conditions 1 and 2 are followed, each device should only appear once.

0 Karma

woodcock
Esteemed Legend

Like this:

| makeresults | eval raw="device=apollo1,result=passed,_time=2019-06-28 device=apollo1,result=failed,_time=2019-06-28 device=zeus8,result=passed,_time=2019-06-23 device=zeus8,result=failed,_time=2019-06-23 device=zeus8,result=error,_time=2019-06-23 device=hermes3,result=failed,_time=2019-06-23 device=hermes3,result=failed,_time=2019-06-23 device=apollo1,result=passed,_time=2019-06-20"
| makemv raw
| mvexpand raw
| rename raw AS _raw
| kv
| eval _time = strptime(time, "%Y-%m-%d")
| fields - _raw time

| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"

| eval sortme = if(result="passed", 0, 1)
| sort 0 _time device sortme
| fields - sortme
| dedup _time device
0 Karma

FrankVl
Ultra Champion

Try this:

|inputlookup example.csv
| sort - device _time result
| dedup device

Note: this does assume error,failed and passed are the only options, where passed conveniently happens to be the furthest down in alphabetic order.

0 Karma

woodcock
Esteemed Legend

This solution does not prefer result=passed over all other values in a time-host tie.

0 Karma

FrankVl
Ultra Champion

Why not? result is taken along in the sort, so for any items with same device and _time, passed would end up on top.

0 Karma

woodcock
Esteemed Legend

Read what OP said. This is the key detail on his request.

0 Karma

FrankVl
Ultra Champion

Yes, and unless I'm completely missing something, my solution does just that. It will sort by device, within that by time and within that by result (all decending) and then takes the top row per device using dedup. Sorting result decending means if there was a "passed" on the last date, it will show on top.

0 Karma

woodcock
Esteemed Legend

He is clearly implying that the passed event might not be on top. Otherwise the solution is trivial.

0 Karma

FrankVl
Ultra Champion

That is why I'm including a sort.

0 Karma

woodcock
Esteemed Legend

He says also that they have the "same time".

0 Karma

FrankVl
Ultra Champion

Yes, again: that is why I include the result field into the sorting. Since passed comes after error and fail in the alphabet, if there are multiple entries with the same time for some device, the pass entry will end up on top.

0 Karma

woodcock
Esteemed Legend

Ah, you are assuming that no states are farther down the alphabet than p.

0 Karma

FrankVl
Ultra Champion

Exactly, as clearly mentioned in my answer 😉

0 Karma

Vijeta
Influencer

@russell120- Try below

|inputlookup example.csv |sort - _time device |streamstats count as i by device| dedup device
0 Karma

russell120
Communicator

@Vijeta Hi, that does not show the latest event that prioritizes where result = passed when a device has 2 (or more) events with the same timestamp and different result value. Please reference this condition from the original question:

When a device appears more than once with a duplicated timestamp and different result value, the event where result = passed should appear. Else, show the event where result is failed or error. What query do I need to do that?

0 Karma

Vijeta
Influencer

@russell120 Sorry for misinterpreting the question , can you try like below-

  |inputlookup example.csv|stats count(eval(result="passed")) as PS, count(eval(result!="passed")) as NP , values(result) as result by _time device|eval result=if(PS>=1 and NP>0,"passed",result)| fields _time device result
0 Karma

russell120
Communicator

It's okay. I edited my initial post and added "//Edit #1" to where I updated it to more accurately reflect my real CSV. What you most recently posted almost accomplished the solution but there are duplicated devices from previous dates.

0 Karma

Vijeta
Influencer

Hi @russell120 you can use stats latest at the end of above search like this-

 |inputlookup example.csv|stats count(eval(result="passed")) as PS, count(eval(result!="passed")) as NP , values(result) as result by _time device|eval result=if(PS>=1 and NP>0,"passed",result)| fields _time device result| stats latest by device
0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.