Solved: Splunk stats count group by multiple fields

shashankk · ‎01-05-2024

Hi Splunk Team

I am having issues while fetching data from 2 stats count fields together.

Below is the query:

index=test_index
| rex "\.(?<TestMQ>.*)\@"
| eval Priority_Level=case(Priority="Low", "Low", Priority="Medium", "Medium", Priority="High", "High")
| stats count as TotalCount, count(eval(Priority_Level="Low")) as Low, count(eval(Priority_Level="Medium")) as Medium, count(eval(Priority_Level="High")) as High by TestMQ

This gives me result like example below:

TestMQ | TotalCount | Low | Medium | High
MQNam1 | 120 | 0 | 0 | 0
MQNam2 | 152 | 0 | 0 | 0
..

The problem is that I am getting "0" value for Low, Medium & High columns - which is not correct. I want to combine both the stats and show the group by results of both the fields.

If I run the same query with separate stats - it gives individual data correctly.

Case 1: stats count as TotalCount by TestMQ

index=test_index
| rex "\.(?<TestMQ>.*)\@"
| eval Priority_Level=case(Priority="Low", "Low", Priority="Medium", "Medium", Priority="High", "High")
| stats count as TotalCount by TestMQ

Example Output:
TestMQ | TotalCount
MQName | 201

Case 2: stats count as PriorityCount by Priority_Level

index=test_index
| rex "\.(?<TestMQ>.*)\@"
| eval Priority_Level=case(Priority="Low", "Low", Priority="Medium", "Medium", Priority="High", "High")
| stats count as PriorityCount by Priority_Level

Example Output:
Priority_Level | PriorityCount
High | 20
Medium | 53
Low | 78

Please help and suggest.

@ITWhisperer - kindly assist.

dtburrows3 · ‎01-08-2024

Since you have provided more sample data and stated what the common field across the events are, I think a search like this may work.

<base_search>
    | rex field=_raw "Priority\=(?<Priority>[^\,]+)"
    | rex "(?:\={3}\>|\<\-{3})\s+TRN[^\:]*\:\s+(?<trn>[^\s]+)"
    | rex "RCV\.FROM\.(?<TestMQ>.*)\@"
    | stats
        count(eval(Priority=="Low")) as Low,
        count(eval(Priority=="Medium")) as Medium,
        count(eval(Priority=="High")) as High,
        values(TestMQ) as TestMQ
            by trn
    | stats
        sum(Low) as Low,
        sum(Medium) as Medium,
        sum(High) as High
            by TestMQ
    | addtotals fieldname="TotalCount"

This is what the final result looks like running against the sample data you provided.

View solution in original post

shashankk · ‎01-05-2024

Hello @PickleRick @gcusello @isoutamo - thanks for your kind response. I am reframing my problem statement here:

Refer below Sample events from the logs:

240108 07:12:07  17709   testget1: ===> TRN@instance2.RQ1:  0000002400840162931785-AHGM0000bA [Priority=Low,ScanPriority=0, Rule: Default Rule].
240108 07:12:07  17709   testget1: <--- TRN:  0000002400840162929525-AHGM00015A - S from [RCV.FROM.TEST.SEP2.Q2@QM.ABCD101].

I am having issues while fetching data from 2 stats (TestMQ and Priority_Level) count fields together.

Below is the query:

index=test_index=*instance*/*testget*
| rex "\:  (?<testgettrn>.*) \- S from"  
| rex "RCV\.FROM\.(?<TestMQ>.*)\@"
| eval Priority_Level=case(Priority="Low", "Low", Priority="Medium", "Medium", Priority="High", "High")
| stats count as TotalCount, count(eval(Priority_Level="Low")) as Low, count(eval(Priority_Level="Medium")) as Medium, count(eval(Priority_Level="High")) as High by TestMQ
| fillnull value=0

This gives me result like example below:

TestMQ    | TotalCount | Low | Medium | High
MQNam1    | 120        | 0   | 0      | 0
MQNam2    | 152        | 0   | 0      | 0
..

The problem is that I am getting "0" value for Low, Medium & High columns - which is not correct. I want to combine both the stats and show the group by results of both the fields.

If I run the same query with separate stats - it gives individual data correctly.

Case 1: stats count as TotalCount by TestMQ

index=test_index=*instance*/*testget*
| rex "\:  (?<testgettrn>.*) \- S from"  
| rex "RCV\.FROM\.(?<TestMQ>.*)\@"
| eval Priority_Level=case(Priority="Low", "Low", Priority="Medium", "Medium", Priority="High", "High")
| stats count as TotalCount by TestMQ

Example Output:
TestMQ  | TotalCount
MQName  | 201

Case 2: stats count as PriorityCount by Priority_Level

index=test_index=*instance*/*testget*
| rex "\:  (?<testgettrn>.*) \- S from"  
| rex "RCV\.FROM\.(?<TestMQ>.*)\@"
| eval Priority_Level=case(Priority="Low", "Low", Priority="Medium", "Medium", Priority="High", "High")
| stats count as PriorityCount by Priority_Level

Example Output: 
Priority_Level | PriorityCount 
High           |  20
Medium         |  53
Low            |  78

Please help and suggest.

gcusello · ‎01-06-2024

Hi @shashankk ,

as @ITWhisperer said, you have the Priority and TestMQ fields in different events, so you canot correlate them.

You have to find a field common to all the events.

So If e.g. Q1 (that's the final part of TestMQ and it's also present in the other events) can be used as key you could run something like this:

| makeresults | eval _raw="240105 18:06:03  19287   testget1: ===> TRN@instance.RQ1:  0000002400509150632034-AERG00001A [Priority=Low,ScanPriority=0, Rule: Default Rule]."
| append [ | makeresults | eval _raw="240105 18:06:03  19287   testget1: ===> TRN@instance.RQ1:  0000002400540101635213-AERG00000A [Priority=Low,ScanPriority=0, Rule: Default Rule]." ]
| append [ | makeresults | eval _raw="240105 18:06:03  19287   testget1: <--- TRN:  0000002481540150632034-AERG00001A - S from [RCV.FROM.TEST.SEP.Q1@QM.ABC123]." ]
| append [ | makeresults | eval _raw="240105 18:06:03  19287   testget1: <--- TRN:  0000002400547150635213-AERG00000A - S from [RCV.FROM.TEST.SEP.Q1@QM.ABC123]. "]
| append [ | makeresults | eval _raw="240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000002400540902427245-AERC000f8A [Priority=Medium,ScanPriority=2, Rule: Default Rule]." ]
| append [ | makeresults | eval _raw="240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000001800540152427236-AERC000f7A [Priority=Medium,ScanPriority=2, Rule: Default Rule]."]
| append [ | makeresults | eval _raw="240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000002400540109427216-AERC000f6A [Priority=High,ScanPriority=1, Rule: Default Rule]." ]
| rex "\:  (?<testgettrn>.*) \- S from"  
| rex "RCV\.FROM\.(?<TestMQ>.*)\@"
| rex field=TestMQ "\w+\.\w+\.(?<key>\w+)"
| rex "TRN\@instance\.R(?<key>[^:]++):"
| rex "Priority\=(?<Priority>\w+)"
| stats values(TestMQ) AS TestMQ count(eval(Priority="Low")) as Low, count(eval(Priority="Medium")) as Medium, count(eval(Priority="High")) as High BY key
| fillnull value=0
| addtotals

Ciao.

Giuseppe

shashankk · ‎01-07-2024

Hi @gcusello @ITWhisperer

In this case I can see the TransactionID is the common field between both the events (TestMQ and Priority) - but I am unable to find how to use the same in the query.

Can you please help and suggest on it? Or can we do a JOIN based transaction id's (for both the event types - TestMQ & Priority)

| rex field=_raw "(?<TransactionID>\d+-\w+)"

gcusello · ‎01-08-2024

Hi @shashankk ,

don't use join because searches are very sow!

using my search you extract the common key that permits to correlate events containing the TestMQ and Priority fields, and thesearch displays the result as you like.

then you could also don't diplay the key used for the correlation having exactly the result you want:

| makeresults | eval _raw="240105 18:06:03  19287   testget1: ===> TRN@instance.RQ1:  0000002400509150632034-AERG00001A [Priority=Low,ScanPriority=0, Rule: Default Rule]."
| append [ | makeresults | eval _raw="240105 18:06:03  19287   testget1: ===> TRN@instance.RQ1:  0000002400540101635213-AERG00000A [Priority=Low,ScanPriority=0, Rule: Default Rule]." ]
| append [ | makeresults | eval _raw="240105 18:06:03  19287   testget1: <--- TRN:  0000002481540150632034-AERG00001A - S from [RCV.FROM.TEST.SEP.Q1@QM.ABC123]." ]
| append [ | makeresults | eval _raw="240105 18:06:03  19287   testget1: <--- TRN:  0000002400547150635213-AERG00000A - S from [RCV.FROM.TEST.SEP.Q1@QM.ABC123]. "]
| append [ | makeresults | eval _raw="240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000002400540902427245-AERC000f8A [Priority=Medium,ScanPriority=2, Rule: Default Rule]." ]
| append [ | makeresults | eval _raw="240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000001800540152427236-AERC000f7A [Priority=Medium,ScanPriority=2, Rule: Default Rule]."]
| append [ | makeresults | eval _raw="240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000002400540109427216-AERC000f6A [Priority=High,ScanPriority=1, Rule: Default Rule]." ]
| rex "\:  (?<testgettrn>.*) \- S from"  
| rex "RCV\.FROM\.(?<TestMQ>.*)\@"
| rex field=TestMQ "\w+\.\w+\.(?<key>\w+)"
| rex "TRN\@instance\.R(?<key>[^:]++):"
| rex "Priority\=(?<Priority>\w+)"
| stats values(TestMQ) AS TestMQ count(eval(Priority="Low")) as Low, count(eval(Priority="Medium")) as Medium, count(eval(Priority="High")) as High BY key
| fields - key
| fillnull value=0
| addtotals

Ciao.

Giuseppe

shashankk · ‎01-08-2024

Hi @gcusello - Thank you for you continuos support. I am able to proceed next with your suggestion but now stuck at one point. Need your help on it.

Kindly suggest.

Query Used:

index=test_index source=*instance*/*testget*
| rex "\:  (?<testgettrn>.*) \- S from"  
| rex "RCV\.FROM\.(?<TestMQ>.*)\@"
| rex field=TestMQ "\w+\.\w+\.(?<key>\w+)"
| rex "TRN\@\\w+\.R(?<key>[^:]++):"
| rex "Priority\=(?<Priority>\w+)"
| stats values(TestMQ) AS TestMQ count(eval(Priority="Low")) as Low, count(eval(Priority="Medium")) as Medium, count(eval(Priority="High")) as High BY key
| fillnull value=0
| addtotals

Getting results as below:

Total count (Q1+Q2) is getting added to Q1 only. And Q2 is remaining null (as shown in below example)

key | TestMQ          | Low | Medium | High | Total
Q1  | TEST.SEP.Q1     | 20  | 20     | 30   | 70
    | TEST.SEP2.Q1    
    | TEST.SEP3.Q1    

Q2  | TEST.SEP.Q2    | 0 | 0 | 0 | 0
    | TEST.SEP2.Q2   
    | TEST.SEP3.Q2

Please guide and suggest.

gcusello · ‎01-08-2024

Hi @shashankk ,

as I said, the problem is to identify a key contained in both the types of your logs: the ones with the TestMQ field and the ones containing Priority filed.

I identified, from your sample few logs the regex to extract Q1 or Q2 or Q3, but evidently it isn't sufficient.

can you identify a common key to use for correlation?

If you haven't this common key it's very hard to correlate events without any relation.

Maybe, if you could share more samples, with more TestMQ, I could help you in key identification and extraction, but anyway, the only approach I see is the one I described: find a common key for correlation.

Ciao.

Giuseppe

shashankk · ‎01-08-2024

Hi @gcusello

I have added below more lines of the sample event file - please help me find the right key.

Or if not possible with the correlation Key - how to proceed with the JOIN in this case?

Kindly guide and suggest.

240108 07:12:07  17709   testget1: ===> TRN@instance2.RQ1:  0000002400840162931785-AHGM0000bA [Priority=Low,ScanPriority=0, Rule: Default Rule].
240108 07:12:07  17709   testget1: <--- TRN:  0000002400840162929525-AHGM00015A - S from [RCV.FROM.TEST.SEP2.Q2@QM.ABCD101].

gcusello · ‎01-08-2024

Hi @shashankk ,

please try this:

<your_search>
| rex "instance(?<key1>\d*)\.R(?<key2>[^:]+)"
| rex "\[Priority\=(?<Priority>\w+)"
| eval TestMQ="TEST.SEP".key1.".".key2
| stats count(eval(Priority="Low")) as Low, count(eval(Priority="Medium")) as Medium, count(eval(Priority="High")) as High BY TestMQ
| fillnull value=0
| addtotals

Ciao.

Giuseppe

dtburrows3 · ‎01-08-2024

Since you have provided more sample data and stated what the common field across the events are, I think a search like this may work.

<base_search>
    | rex field=_raw "Priority\=(?<Priority>[^\,]+)"
    | rex "(?:\={3}\>|\<\-{3})\s+TRN[^\:]*\:\s+(?<trn>[^\s]+)"
    | rex "RCV\.FROM\.(?<TestMQ>.*)\@"
    | stats
        count(eval(Priority=="Low")) as Low,
        count(eval(Priority=="Medium")) as Medium,
        count(eval(Priority=="High")) as High,
        values(TestMQ) as TestMQ
            by trn
    | stats
        sum(Low) as Low,
        sum(Medium) as Medium,
        sum(High) as High
            by TestMQ
    | addtotals fieldname="TotalCount"

This is what the final result looks like running against the sample data you provided.

inventsekar · ‎01-08-2024

Hi @shashankk

source="MQlogs.txt" host="test" sourcetype="MQ" 
| rex "\:  (?<testgettrn>.*) \- S from"  
| rex "RCV\.FROM\.(?<TestMQ>.*)\@"
| rex field=TestMQ "\w+\.\w+\.(?<key>\w+)"
| rex "TRN\@\\w+\.R(?<key>[^:]++):"
| rex "Priority\=(?<Priority>\w+)"
|table _raw TestMQ key priority
```| stats values(TestMQ) AS TestMQ count(eval(Priority="Low")) as Low, count(eval(Priority="Medium")) as Medium, count(eval(Priority="High")) as High BY key
| fillnull value=0
| addtotals```

Could you pls run this and update us the results screenshot.. when i run this one, the priority is not extracted. looks like something wrong. pls suggest, thanks.

ITWhisperer · ‎01-05-2024

TestMQ doesn't appear in the same events as priority which is why the stats are coming out as zero

PickleRick · ‎01-05-2024

It is interesting because the mechanics itself works.

You can verify it by replacing count(eval()) by creating a temporary field.

| stats count(eval(condition))

is equivalent to

| eval tempfield=if(condition,something,null())
| stats count(tempfield)

So you can rework your search to contain explicit helper fields instead of eval-ed counts and see if they are counted properly and if they are evaluated properly.

isoutamo · ‎01-05-2024

Hi

how about something like

index=test_index
| rex "\.(?<TestMQ>.*)\@"
| chart count by TestMQ Priority

r. Ismo

PS. Please use </> option when you are writing some SPL etc. That way it's formatted correctly and we can be sure that what we are seeing is what you have written.

shashankk · ‎01-05-2024

@isoutamo Thanks for your kind response. I tried with the suggested approach. But it doesn't give the expected result.

index=Test
| rex "(?<TestMQ>.*)\@" 
| eval Priority_Level=case(Priority="Low", "Low", Priority="Medium", "Medium", Priority="High", "High")
| chart count BY TestMQ, Priority_Level
| fillnull value=0

shashankk · ‎01-05-2024

Hi @gcusello
Thank you for your kind response. I tried the suggested approach but the result is same.

This gives me result like example below:

TestMQ | TotalCount | Low | Medium | High
MQNam1 | 120 | 0 | 0 | 0
MQNam2 | 152 | 0 | 0 | 0

The problem is that I am getting "0" value for Low, Medium & High columns - which is not correct. I want to combine both the stats and show the group by results of both the fields.

Please suggest.

gcusello · ‎01-05-2024

Hi @shashankk ,

could you share a sample of your full events for Low, Medium and High Priority?

Ciao.

Giuseppe

shashankk · ‎01-05-2024

Hi @gcusello

Refer below requested sample query and event details:

Kindly suggest.

index=test_index=*instance*/*testget*
| rex "\:  (?<testgettrn>.*) \- S from"  
| rex "RCV\.FROM\.(?<TestMQ>.*)\@"
| eval Priority_Level=case(Priority="Low", "Low", Priority="Medium", "Medium", Priority="High", "High")
| stats count as TotalCount, count(eval(Priority_Level="Low")) as Low, count(eval(Priority_Level="Medium")) as Medium, count(eval(Priority_Level="High")) as High by TestMQ
| fillnull value=0

Sample Events:

240105 18:06:03  19287   testget1: ===> TRN@instance.RQ1:  0000002400509150632034-AERG00001A [Priority=Low,ScanPriority=0, Rule: Default Rule].
host = testserver2.com source = /test/test.logsourcetype = testscan

240105 18:06:03  19287   testget1: ===> TRN@instance.RQ1:  0000002400540101635213-AERG00000A [Priority=Low,ScanPriority=0, Rule: Default Rule].
host = testserver2.com source = /test/test.log sourcetype = testscan

240105 18:06:03  19287   testget1: <--- TRN:  0000002481540150632034-AERG00001A - S from [RCV.FROM.TEST.SEP.Q1@QM.ABC123].
host = testserver2.com source = /test/test.log sourcetype = testscan

240105 18:06:03  19287   testget1: <--- TRN:  0000002400547150635213-AERG00000A - S from [RCV.FROM.TEST.SEP.Q1@QM.ABC123].
host = testserver2.com source = /test/test.log sourcetype = testscan

240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000002400540902427245-AERC000f8A [Priority=Medium,ScanPriority=2, Rule: Default Rule].
host = testserver1.com source = /test/test.log sourcetype = testscan

240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000001800540152427236-AERC000f7A [Priority=Medium,ScanPriority=2, Rule: Default Rule].
host = testserver1.com source = /test/test.log sourcetype = testscan

240105 18:02:29  72965   testget1: ===> TRN@instance.RQ1:  0000002400540109427216-AERC000f6A [Priority=High,ScanPriority=1, Rule: Default Rule].
host = testserver1.com source = /test/test.log sourcetype = testscan

dtburrows3 · ‎01-05-2024

Is there a common field across the two types of events to correlate them together?
Since the field TestMQ and Priority are contained in separate events then just doing a simple stats using TestMQ as a by-field will not work.

But if there is some way to stitch the two event type together first, then you could make it work.

I am not familiar enough with the data and the sample size is too small to figure out what that correlation field may be (if it exists at all) but I did put this together as a proof of concept.

Example:

<base_search>
    | rex "(?:\={3}\>|\<\-{3})\s+TRN[^\:]*\:\s+(?<trn>[^\s]+)"
    | rex "RCV\.FROM\.(?<TestMQ>.*)\@"
    ``` using this rex as a demonstration of an example of correlation field to link events together ```
    | rex "\d+\s+\d{2}(?:\:\d{2}){2}\s+\d+\s+(?<corr_field>[^\:]+)"
    | bucket span=30m _time
    ``` attribute extracted TestMQ field values to events with the same correlation field and close proximity in time ```
    | eventstats
        values(TestMQ) as TestMQ
            by _time, corr_field
    ``` we can now filter down to events with Priority field available now that they have a TestMQ value contribution ```
    | where isnotnull(Priority)
    | chart 
        count as count
            over TestMQ
            by Priority
    | addtotals fieldname="TotalCount"
    | fields + TestMQ, Low, Medium, High, TotalCount

Results would look something like this (but probably with more rows with live data)

I would have selected "host" as the correlation field for the example but with the 5 sample events, "testserver1.com" didn't appear to have any TestMQ attribution. So I just extracted "testget1" since that was a common value in the logs. I'm not stating this is the correct correlation field by any-means, just for demonstration purposes only.

Edit:
And I think you could probably do something similar without using an eventstats command provided a corr_field exists with a 1-to-1 mapping with TestMQ value.

<base_search>
    | rex "(?:\={3}\>|\<\-{3})\s+TRN[^\:]*\:\s+(?<trn>[^\s]+)"
    | rex "RCV\.FROM\.(?<TestMQ>.*)\@"
    | rex "(?:\={3}\>|\<\-{3})\s+TRN[^\:]*\:\s+(?<trn>[^\s]+)"
    | rex "RCV\.FROM\.(?<TestMQ>.*)\@"
    ``` using this rex as a demonstration of an example of correlation field to link events together ```
    | rex "\d+\s+\d{2}(?:\:\d{2}){2}\s+\d+\s+(?<corr_field>[^\:]+)"
    | stats
        count(eval(Priority=="Low")) as Low,
        count(eval(Priority=="Medium")) as Medium,
        count(eval(Priority=="High")) as High,
        values(TestMQ) as TestMQ
            by corr_field
    | fields + TestMQ, Low, Medium, High
    | addtotals fieldname="TotalCount"

PickleRick · ‎01-05-2024

OK. In your example data only small subset of events has the RCV.FROM string which you use to anchor for the TestMQ field.

That means that most of the events doesn't have the field.

So if you do stats by that field, you won't get results where there is no value in this field.

gcusello · ‎01-05-2024

Hi @shashankk,

try to simplify your search because the eval isn't mandatory:

index=test_index
| rex "\.(?<TestMQ>.*)\@"
| stats 
   count AS TotalCount
   count(eval(Priority="Low")) AS Low
   count(eval(Priority="Medium")) AS Medium
   count(eval(Priority="High")) AS High
   BY TestMQ

Ciao.

Giuseppe

Splunk stats count group by multiple fields

fields

join

stats

table

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio