Hi at all,
I have a data structure like the following:
title1 title2 title3 title4 value
and I need to group by title1 and having title4 where value (numeric field) is max.
How can I use eval in stats to have this?
something like this:
| stats values(eval(title4 where value is max)) AS title4 BY title1
How can I do it?
Ciao.
Giuseppe
@gcusello a couple of ways with eventstats
| makeresults count=300
| fields - _time
| eval title1="Title".mvindex(split("ABC",""), random() % 3)
| eval value=random() % 100
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), random() % 10)
``` Data creation above ```
| eventstats max(value) as max_val by title1
| stats values(eval(if(value=max_val, title4, null()))) as title4 max(max_val) as max_val by title1
Or depending on your title4 data you can put in another stats, i.e. after the data set up above, do
``` Reduce the data first before the eventstats ```
| stats max(value) as max_val by title1 title4
| eventstats max(max_val) as max by title1
| stats values(eval(if(max_val=max, title4, null()))) as title4 max(max) as max by title1
This way the eventstats works on a far smaller dataset, depending on your cardinality
Lol we are all secretly trying to decipher the sentence😀 (I thought @bowesmana had both methods covered when I read this last night.) OK, I think I cranked the code. Using the same strategy (but deterministic for easy validation) I constructed this mock dataset:
title1 | title4 | value |
Title1:B | Title4-Y | 1 |
Title1:C | Title4-X | 2 |
Title1:A | Title4-W | 3 |
Title1:B | Title4-V | 4 |
Title1:C | Title4-U | 0 |
Title1:A | Title4-T | 1 |
Title1:B | Title4-S | 2 |
Title1:C | Title4-R | 3 |
Title1:A | Title4-Q | 4 |
Title1:B | Title4-Z | 0 |
Title1:C | Title4-Y | 1 |
Title1:A | Title4-X | 2 |
Title1:B | Title4-W | 3 |
Title1:C | Title4-V | 4 |
Title1:A | Title4-U | 0 |
Title1:B | Title4-T | 1 |
Title1:C | Title4-S | 2 |
Title1:A | Title4-R | 3 |
Title1:B | Title4-Q | 4 |
Title1:C | Title4-Z | 0 |
Title1:A | Title4-Y | 1 |
Title1:B | Title4-X | 2 |
Title1:C | Title4-W | 3 |
Title1:A | Title4-V | 4 |
Title1:B | Title4-U | 0 |
I think the semantics is: Find the Title4 that corresponds to the maximum value in the whole set - in this case, Title4-Q and Title4-V, as it corresponds to value 4; then, find all rows with these Title4 group them by Title1. I.e.,
| eventstats max(value) as max_val
| where value == max_val
| stats values(title4) as title4 by title1
The output for the mock data is
title1 | title4 |
Title1:A | Title4-Q Title4-V |
Title1:B | Title4-Q Title4-V |
Title1:C | Title4-V |
Here is the emulation
| makeresults count=25
| streamstats count
| eval value = count % 5
| eval title1="Title1:".mvindex(split("ABCDE",""), count % 3)
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), count % 10)
| fields - _time count
``` data emulation above ```
Similarly a double-stats strategy can be construed.
Hi @yuanliu ,
I used all your solutions to have this:
| eventstats max(alert_level) as max_val BY title1
| stats
values(eval(if(alert_level=max_val,title4,""))) AS title4
max(alert_level) AS alert_level
BY title1
Thank you for you all support.
Ciao.
Giuseppe
Ahhhh... You had yet another field _called_ value. I suppose we all missed that and assumed "value" meant the value of one of the title* fields, not a separate field. *facepalm*
In this case, you can still avoid using eventstats
| sort - alert_level title1
| streamstats current=t dc(alert_level) as selector by title1
| where selector=1
| stats values(title4) as title4s by title1
Don't get me wrong - eventstats is a powerful and useful command but with some bigger datasets you might consider alternatives.
Ahh... ok. If it is suppossed to mean all results for the max value of field1, it's also a relatively easy to use sort and streamstats.
Your typical
| sort - field1
will give you your data sorted in descending order. That means that you have your max values first. This in turn means that you don't have to eventstats over whole resukt set. Just use streamstats to copy over the first value which must be the maximum value.
| streamstats current=t first(field1) as field1max
Now all that's left is to filter
| where field1=field1max
Since we're operating on our initial result we've retained all original fields.
Of course for for additional performance boost you can remove unnecessary fields prior to sorting so you don't needlessly drag them around just to get rid of them immediately after if you have a big data set to sort.(Same goes for limiting your processed data volume in with eventstats-based solution)
@gcusello a couple of ways with eventstats
| makeresults count=300
| fields - _time
| eval title1="Title".mvindex(split("ABC",""), random() % 3)
| eval value=random() % 100
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), random() % 10)
``` Data creation above ```
| eventstats max(value) as max_val by title1
| stats values(eval(if(value=max_val, title4, null()))) as title4 max(max_val) as max_val by title1
Or depending on your title4 data you can put in another stats, i.e. after the data set up above, do
``` Reduce the data first before the eventstats ```
| stats max(value) as max_val by title1 title4
| eventstats max(max_val) as max by title1
| stats values(eval(if(max_val=max, title4, null()))) as title4 max(max) as max by title1
This way the eventstats works on a far smaller dataset, depending on your cardinality
Hi @all,
Thank you for all your hints, but my issue is that I must find the title4, for each title1 where value is max, with this solution I find the max value for each title1, not the title4 where value is max and relative value for each title1.
Have you any other hint?
Ciao.
Giuseppe
I too don't quite get your statement "where value is max" - you said you have
title1 title2 title3 title4 value
so I assumed titles are text elements and the value is numeric. Does the table below model your data or is it different?
title1 title4 value
TitleC | Title4-X | 16 | |
TitleA | Title4-X | 69 | |
TitleA | Title4-X | 83 | |
TitleC | Title4-X | 92 | |
TitleB | Title4-X | 45 | |
TitleA | Title4-Y | 90 | |
TitleA | Title4-Y | 87 | |
TitleB | Title4-Y | 97 | |
TitleB | Title4-Y | 7 | |
TitleB | Title4-Y | 54 | |
TitleB | Title4-Y | 85 | |
TitleC | Title4-Y | 58 | |
TitleC | Title4-Y | 18 | |
TitleA | Title4-Z | 10 | |
TitleC | Title4-Z | 31 | |
TitleA | Title4-Z | 38 | |
TitleA | Title4-Z | 46 | |
TitleB | Title4-Z | 57 | |
TitleA | Title4-Z | 27 | |
TitleB | Title4-Z | 71 |
What does max in your description represent? I understood you want all the values of title4 "where value is max". Can you define what max is.
For title4-X, Y and Z the max of values by title 4 are 92, 97 and 71.
For title1-A, B and C the max of values by title1 are 90, 97 and 92.
Do either of these describe your 'max'.
An example would be useful?
Ok. Honestly, I'm a bit confused. I don't understand what you mean by "where value is max".
As I understand it if you have
title1 | title4 |
1 | 3 |
2 | 5 |
3 | 7 |
1 | 2 |
2 | 3 |
3 | 5 |
1 | 1 |
You want
title1 | title4 |
1 | 3 |
2 | 5 |
3 | 7 |
as a result because for each value of title1 you want the max value of title4, no?
Maybe we just misunderstand each other...
Hi
Maybe eventstats to add additional field where is title4’s values based on max value? I know that this is not an efficient way, but it’s first which comes into my mind. Probably there is better ways @ITWhisperer, @PickleRick, @richgalloway ?
r. Ismo
Yup. +1 on eventstats. Stats will aggregate all data leaving you with just max value. Appendpipe will append stats at the end but you'll still have them as a separate entity. You could use subsearch but it would be ugly and ineffective (you'd have to run the main search twice effectively). Eventstats it is.
But since eventstats has limitations, you can cheat a little.
| sort - title1 title4
| dedup title1
It doesn't replace eventstats in a general case but for max or min value it might be a bit quicker than eventstats and will almost surely have lower memory footprint.
I agree. I would try eventstats as well.