Splunk Search

eval in stats with max

gcusello
SplunkTrust
SplunkTrust

Hi at all,

I have a data structure like the following:

 

 

title1 title2 title3 title4 value

 

 

and I need to group by title1 and having title4 where value (numeric field) is max.

How can I use eval in stats to have this?

something like this:

 

 

| stats values(eval(title4 where value is max)) AS title4 BY title1

 

 

How can I do it?

Ciao.

Giuseppe

Labels (1)
Tags (2)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

@gcusello a couple of ways with eventstats

| makeresults count=300
| fields - _time
| eval title1="Title".mvindex(split("ABC",""), random() % 3)
| eval value=random() % 100
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), random() % 10)
``` Data creation above ```
| eventstats max(value) as max_val by title1
| stats values(eval(if(value=max_val, title4, null()))) as title4 max(max_val) as max_val by title1

Or depending on your title4 data you can put in another stats, i.e. after the data set up above, do

``` Reduce the data first before the eventstats ```
| stats max(value) as max_val by title1 title4
| eventstats max(max_val) as max by title1
| stats values(eval(if(max_val=max, title4, null()))) as title4 max(max) as max by title1

 This way the eventstats works on a far smaller dataset, depending on your cardinality

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

Lol we are all secretly trying to decipher the sentence😀 (I thought @bowesmana had both methods covered when I read this last night.)  OK, I think I cranked the code.  Using the same strategy (but deterministic for easy validation) I constructed this mock dataset:

title1title4value
Title1:BTitle4-Y1
Title1:CTitle4-X2
Title1:ATitle4-W3
Title1:BTitle4-V4
Title1:CTitle4-U0
Title1:ATitle4-T1
Title1:BTitle4-S2
Title1:CTitle4-R3
Title1:ATitle4-Q4
Title1:BTitle4-Z0
Title1:CTitle4-Y1
Title1:ATitle4-X2
Title1:BTitle4-W3
Title1:CTitle4-V4
Title1:ATitle4-U0
Title1:BTitle4-T1
Title1:CTitle4-S2
Title1:ATitle4-R3
Title1:BTitle4-Q4
Title1:CTitle4-Z0
Title1:ATitle4-Y1
Title1:BTitle4-X2
Title1:CTitle4-W3
Title1:ATitle4-V4
Title1:BTitle4-U0

I think the semantics is: Find the Title4 that corresponds to the maximum value in the whole set - in this case, Title4-Q and Title4-V, as it corresponds to value 4; then, find all rows with these Title4 group them by Title1. I.e.,

 

| eventstats max(value) as max_val
| where value == max_val
| stats values(title4) as title4 by title1

 

The output for the mock data is

title1
title4
Title1:A
Title4-Q
Title4-V
Title1:B
Title4-Q
Title4-V
Title1:CTitle4-V

Here is the emulation

 

| makeresults count=25
| streamstats count
| eval value = count % 5
| eval title1="Title1:".mvindex(split("ABCDE",""), count % 3)
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), count % 10)
| fields - _time count
``` data emulation above ```

 

Similarly a double-stats strategy can be construed.

gcusello
SplunkTrust
SplunkTrust

Hi @yuanliu ,

I used all your solutions to have this:

| eventstats max(alert_level) as max_val BY title1
            | stats 
               values(eval(if(alert_level=max_val,title4,""))) AS title4
               max(alert_level) AS alert_level 
               BY title1

Thank you for you all support.

Ciao.

Giuseppe

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ahhhh... You had yet another field _called_ value. I suppose we all missed that and assumed "value" meant the value of one of the title* fields, not a separate field. *facepalm*

In this case, you can still avoid using eventstats

| sort - alert_level title1
| streamstats current=t dc(alert_level) as selector by title1
| where selector=1
| stats values(title4) as title4s by title1

Don't get me wrong - eventstats is a powerful and useful command but with some bigger datasets you might consider alternatives.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ahh... ok. If it is suppossed to mean all results for the max value of field1, it's also a relatively easy to use sort and streamstats.

Your typical

| sort - field1

will give you your data sorted in descending order. That means that you have your max values first. This in turn means that you don't have to eventstats over whole resukt set. Just use streamstats to copy over the first value which must be the maximum value.

| streamstats current=t first(field1) as field1max

Now all that's left is to filter

| where field1=field1max

Since we're operating on our initial result we've retained all original fields.

Of course for for additional performance boost you can remove unnecessary fields prior to sorting so you don't needlessly drag them around just to get rid of them immediately after if you have a big data set to sort.(Same goes for limiting your processed data volume in with eventstats-based solution)

bowesmana
SplunkTrust
SplunkTrust

@gcusello a couple of ways with eventstats

| makeresults count=300
| fields - _time
| eval title1="Title".mvindex(split("ABC",""), random() % 3)
| eval value=random() % 100
| eval title4="Title4-".mvindex(split("ZYXWVUTSRQ",""), random() % 10)
``` Data creation above ```
| eventstats max(value) as max_val by title1
| stats values(eval(if(value=max_val, title4, null()))) as title4 max(max_val) as max_val by title1

Or depending on your title4 data you can put in another stats, i.e. after the data set up above, do

``` Reduce the data first before the eventstats ```
| stats max(value) as max_val by title1 title4
| eventstats max(max_val) as max by title1
| stats values(eval(if(max_val=max, title4, null()))) as title4 max(max) as max by title1

 This way the eventstats works on a far smaller dataset, depending on your cardinality

gcusello
SplunkTrust
SplunkTrust

Hi @all,

Thank you for all your hints, but my issue is that I must find the title4, for each title1 where value is max, with this solution I find the max value for each title1, not the title4 where value is max and relative value for each title1.

Have you any other hint?

Ciao.

Giuseppe

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

I too don't quite get your statement "where value is max"  - you said you have 

 

title1 title2 title3 title4 value

 

so I assumed titles are text elements and the value is numeric. Does the table below model your data or is it different?

title1 title4 value  

TitleCTitle4-X16 
TitleATitle4-X69 
TitleATitle4-X83 
TitleCTitle4-X92 
TitleBTitle4-X45 
TitleATitle4-Y90 
TitleATitle4-Y87 
TitleBTitle4-Y97 
TitleBTitle4-Y7 
TitleBTitle4-Y54 
TitleBTitle4-Y85 
TitleCTitle4-Y58 
TitleCTitle4-Y18 
TitleATitle4-Z10 
TitleCTitle4-Z31 
TitleATitle4-Z38 
TitleATitle4-Z46 
TitleBTitle4-Z57 
TitleATitle4-Z27 
TitleBTitle4-Z71 


What does max in your description represent?  I understood you want all the values of title4 "where value is max". Can you define what max is.

For title4-X, Y and Z the max of values by title 4 are 92, 97 and 71.

For title1-A, B and C the max of values by title1 are 90, 97 and 92.

Do either of these describe your 'max'.

An example would be useful?

PickleRick
SplunkTrust
SplunkTrust

Ok. Honestly, I'm a bit confused. I don't understand what you mean by "where value is max".

As I understand it if you have

title1title4
13
25
37
12
23
35
11

You want

title1title4
13
25
37

as a result because for each value of title1 you want the max value of title4, no?

Maybe we just misunderstand each other...

isoutamo
SplunkTrust
SplunkTrust

Hi
Maybe eventstats to add additional field where is title4’s values based on max value? I know that this is not an efficient way, but it’s first which comes into my mind. Probably there is better ways @ITWhisperer@PickleRick@richgalloway ?
r. Ismo

PickleRick
SplunkTrust
SplunkTrust

Yup. +1 on eventstats. Stats will aggregate all data leaving you with just max value. Appendpipe will append stats at the end but you'll still have them as a separate entity. You could use subsearch but it would be ugly and ineffective (you'd have to run the main search twice effectively). Eventstats it is.

But since eventstats has limitations, you can cheat a little.

| sort - title1 title4
| dedup title1

It doesn't replace eventstats in a general case but for max or min value it might be a bit quicker than eventstats and will almost surely have lower memory footprint.

richgalloway
SplunkTrust
SplunkTrust

I agree.  I would try eventstats as well.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...