Solved: eval in stats with max

gcusello

Hi at all,

I have a data structure like the following:

title1 title2 title3 title4 value

and I need to group by title1 and having title4 where value (numeric field) is max.

How can I use eval in stats to have this?

something like this:

| stats values(eval(title4 where value is max)) AS title4 BY title1

How can I do it?

Ciao.

Giuseppe

bowesmana

@gcusello a couple of ways with eventstats

| makeresults count=300
| fields - _time
| eval title1="Title".mvindex(split("ABC",""), random() % 3)
| eval value=random() % 100
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), random() % 10)
``` Data creation above ```
| eventstats max(value) as max_val by title1
| stats values(eval(if(value=max_val, title4, null()))) as title4 max(max_val) as max_val by title1

Or depending on your title4 data you can put in another stats, i.e. after the data set up above, do

``` Reduce the data first before the eventstats ```
| stats max(value) as max_val by title1 title4
| eventstats max(max_val) as max by title1
| stats values(eval(if(max_val=max, title4, null()))) as title4 max(max) as max by title1

This way the eventstats works on a far smaller dataset, depending on your cardinality

View solution in original post

yuanliu

Lol we are all secretly trying to decipher the sentence😀 (I thought @bowesmana had both methods covered when I read this last night.) OK, I think I cranked the code. Using the same strategy (but deterministic for easy validation) I constructed this mock dataset:

title1	title4	value
Title1:B	Title4-Y	1
Title1:C	Title4-X	2
Title1:A	Title4-W	3
Title1:B	Title4-V	4
Title1:C	Title4-U	0
Title1:A	Title4-T	1
Title1:B	Title4-S	2
Title1:C	Title4-R	3
Title1:A	Title4-Q	4
Title1:B	Title4-Z	0
Title1:C	Title4-Y	1
Title1:A	Title4-X	2
Title1:B	Title4-W	3
Title1:C	Title4-V	4
Title1:A	Title4-U	0
Title1:B	Title4-T	1
Title1:C	Title4-S	2
Title1:A	Title4-R	3
Title1:B	Title4-Q	4
Title1:C	Title4-Z	0
Title1:A	Title4-Y	1
Title1:B	Title4-X	2
Title1:C	Title4-W	3
Title1:A	Title4-V	4
Title1:B	Title4-U	0

I think the semantics is: Find the Title4 that corresponds to the maximum value in the whole set - in this case, Title4-Q and Title4-V, as it corresponds to value 4; then, find all rows with these Title4 group them by Title1. I.e.,

| eventstats max(value) as max_val
| where value == max_val
| stats values(title4) as title4 by title1

The output for the mock data is

title1	title4
Title1:A	Title4-Q Title4-V
Title1:B	Title4-Q Title4-V
Title1:C	Title4-V

Here is the emulation

| makeresults count=25
| streamstats count
| eval value = count % 5
| eval title1="Title1:".mvindex(split("ABCDE",""), count % 3)
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), count % 10)
| fields - _time count
``` data emulation above ```

Similarly a double-stats strategy can be construed.

gcusello

Hi @yuanliu ,

I used all your solutions to have this:

| eventstats max(alert_level) as max_val BY title1
            | stats 
               values(eval(if(alert_level=max_val,title4,""))) AS title4
               max(alert_level) AS alert_level 
               BY title1

Thank you for you all support.

Ciao.

Giuseppe

PickleRick

Ahhhh... You had yet another field _called_ value. I suppose we all missed that and assumed "value" meant the value of one of the title* fields, not a separate field. *facepalm*

In this case, you can still avoid using eventstats

| sort - alert_level title1
| streamstats current=t dc(alert_level) as selector by title1
| where selector=1
| stats values(title4) as title4s by title1

Don't get me wrong - eventstats is a powerful and useful command but with some bigger datasets you might consider alternatives.

PickleRick

Ahh... ok. If it is suppossed to mean all results for the max value of field1, it's also a relatively easy to use sort and streamstats.

Your typical

| sort - field1

will give you your data sorted in descending order. That means that you have your max values first. This in turn means that you don't have to eventstats over whole resukt set. Just use streamstats to copy over the first value which must be the maximum value.

| streamstats current=t first(field1) as field1max

Now all that's left is to filter

| where field1=field1max

Since we're operating on our initial result we've retained all original fields.

Of course for for additional performance boost you can remove unnecessary fields prior to sorting so you don't needlessly drag them around just to get rid of them immediately after if you have a big data set to sort.(Same goes for limiting your processed data volume in with eventstats-based solution)

bowesmana

@gcusello a couple of ways with eventstats

| makeresults count=300
| fields - _time
| eval title1="Title".mvindex(split("ABC",""), random() % 3)
| eval value=random() % 100
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), random() % 10)
``` Data creation above ```
| eventstats max(value) as max_val by title1
| stats values(eval(if(value=max_val, title4, null()))) as title4 max(max_val) as max_val by title1

Or depending on your title4 data you can put in another stats, i.e. after the data set up above, do

``` Reduce the data first before the eventstats ```
| stats max(value) as max_val by title1 title4
| eventstats max(max_val) as max by title1
| stats values(eval(if(max_val=max, title4, null()))) as title4 max(max) as max by title1

This way the eventstats works on a far smaller dataset, depending on your cardinality

gcusello

Hi @all,

Thank you for all your hints, but my issue is that I must find the title4, for each title1 where value is max, with this solution I find the max value for each title1, not the title4 where value is max and relative value for each title1.

Have you any other hint?

Ciao.

Giuseppe

bowesmana

I too don't quite get your statement "where value is max" - you said you have

title1 title2 title3 title4 value

so I assumed titles are text elements and the value is numeric. Does the table below model your data or is it different?

title1 title4 value

TitleC	Title4-X	16
TitleA	Title4-X	69
TitleA	Title4-X	83
TitleC	Title4-X	92
TitleB	Title4-X	45
TitleA	Title4-Y	90
TitleA	Title4-Y	87
TitleB	Title4-Y	97
TitleB	Title4-Y	7
TitleB	Title4-Y	54
TitleB	Title4-Y	85
TitleC	Title4-Y	58
TitleC	Title4-Y	18
TitleA	Title4-Z	10
TitleC	Title4-Z	31
TitleA	Title4-Z	38
TitleA	Title4-Z	46
TitleB	Title4-Z	57
TitleA	Title4-Z	27
TitleB	Title4-Z	71

What does max in your description represent? I understood you want all the values of title4 "where value is max". Can you define what max is.

For title4-X, Y and Z the max of values by title 4 are 92, 97 and 71.

For title1-A, B and C the max of values by title1 are 90, 97 and 92.

Do either of these describe your 'max'.

An example would be useful?

PickleRick

Ok. Honestly, I'm a bit confused. I don't understand what you mean by "where value is max".

As I understand it if you have

title1	title4
1	3
2	5
3	7
1	2
2	3
3	5
1	1

You want

title1	title4
1	3
2	5
3	7

as a result because for each value of title1 you want the max value of title4, no?

Maybe we just misunderstand each other...

isoutamo

Hi
Maybe eventstats to add additional field where is title4’s values based on max value? I know that this is not an efficient way, but it’s first which comes into my mind. Probably there is better ways @ITWhisperer, @PickleRick, @richgalloway ?
r. Ismo

PickleRick

Yup. +1 on eventstats. Stats will aggregate all data leaving you with just max value. Appendpipe will append stats at the end but you'll still have them as a separate entity. You could use subsearch but it would be ugly and ineffective (you'd have to run the main search twice effectively). Eventstats it is.

But since eventstats has limitations, you can cheat a little.

| sort - title1 title4
| dedup title1

It doesn't replace eventstats in a general case but for max or min value it might be a bit quicker than eventstats and will almost surely have lower memory footprint.

richgalloway

I agree. I would try eventstats as well.

---
If this reply helps you, Karma would be appreciated.

eval in stats with max

stats

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?