Solved: How to Group data by a few factors?

tatery · ‎03-07-2018

Hello,

I need to prepare statistics of some events occurrences and this is my data in splunk:

07-03-18;11:55:14;id:2222222;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;<c> f:Pool-C,l:959;<d> f:Pool-C,l:454;<e> f:Pool-D,l:959;Stat a:0
07-03-18;11:55:14;id:3333333;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;Stat a:10
07-03-18;11:56:14;id:2222222;Stat a:0
07-03-18;11:56:14;id:2222222;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;Stat a:10
07-03-18;11:56:14;id:3333333;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;Stat a:10

Now I'd like to display how many times f:Pool-X (where Pool-X could be any string) with the same value of "l:" (for example l:2066) occurs for each node represented by "id".
As you can see for given node there may be few f:XXXX values (in different order) or non in one entry. There is a lot of entries for the same nodes.

This is what I'd like to achieve:

So far I have no idea how to do this. Maybe some of you could help with this not trivial issue.

somesoni2 · ‎03-07-2018

Give this a try (assuming no field extraction is done, if it has been, you can ignore the rex commands)

your base search 
| rex "id:(?<id>[^\;]+)"
| rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
| stats count by id Pool l
| stats list(count) as count list(Pool) as Pool list(l) as l by id

View solution in original post

somesoni2 · ‎03-07-2018

Give this a try (assuming no field extraction is done, if it has been, you can ignore the rex commands)

your base search 
| rex "id:(?<id>[^\;]+)"
| rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
| stats count by id Pool l
| stats list(count) as count list(Pool) as Pool list(l) as l by id

tatery · ‎03-07-2018

Hi somesoni2, your query is perfect 🙂 However now I see that this is not exactly what I need. Could you please help again and reorganize this query to group data by the same Pool and l then calculate how many ids belongs to this set. Then display ids with occurrences counter. Example:

Pool-AA       2066     25       3333333(20), 2222222(5)
               245     10       3333333(4), 4444444(5), 22222222(1)

where:
- Pool-AA is Pool
- 2066 and 245 are l
- 25 and 10 are total counter of pair Pool and l
- 3333333(20), 2222222(5) - list of ids with number of occurrences

This query is pretty close to the result I'd like to achieve:

stats count by Pool l id
| stats list(count) as count list(id) as ID list(l) as l by Pool

I'm totally newbie in splunk so any help or advice are very welcome.
Many thanks in advance. I really appreciate your help,

somesoni2 · ‎03-07-2018

Try this (first 4 lines are same as previous version, rest are the code to format the output per your need)

your base search 
 | rex "id:(?<id>[^\;]+)"
 | rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
 | stats count by id Pool l
 | eval id=id."(".count.")"
 | stats sum(count) as count list(id) as id by Pool l delim="," | nomv id
 | stats list(l) as l list(count) as count list(id) as id by Pool

You can learn more about stats command and available options by using this Splunk documentation page: http://docs.splunk.com/Documentation/Splunk/7.0.2/SearchReference/Stats. You can see other command on left side navigation tree. To understand the commands/steps better, add one part at a time to see how it's changing the output.

tatery · ‎03-07-2018

Do you know if splunk has some limitation for number of displayed data? I see that counter for pair Pool:l shows for example 259 ids but there is 100 ids in id column. Is it expected result? For sure, I took into consideration values from brackets.

somesoni2 · ‎03-07-2018

The count columns gives total count (e.g. 25 in your example dat) and id column will have all unique ids (e.g 2 in your example data). You're saying you have more unique ids and they are not shown?

tatery · ‎03-07-2018

Yes, I have more unique ids than they are shown. However, I think that this make sense because how to display thousands of data? I didn't mention that I use splunk web search tool not API.

somesoni2 · ‎03-07-2018

The multivalued command do have limitation of memory usage (see limits.conf specification to more details) which may cause truncation. However, showing 100's of entries in single result row may not be readable anyways. See if you're ok with format of the result with this query.

your base search 
  | rex "id:(?<id>[^\;]+)"
  | rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
  | stats count by id Pool l
  | eval id=id."(".count.")"
  | stats sum(count) as count list(id) as id by Pool l delim="," | nomv id

tatery · ‎03-07-2018

Yes, this query is also nice. When you add extra space to delim (e.g delim=", ") then ids are displayed as multi-lines what is really nice. Thanks again for your help.

tatery · ‎03-08-2018

Unfortunately count is not working as expected. For the following input data it shows that pair Pool-A,l:2066 for id:2222222 occurs only once:

07-03-18;11:55:14;id:2222222;<a> f:Pool-A,l:2066
07-03-18;11:55:14;id:2222222;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;<c> f:Pool-C,l:959;<d> f:Pool-C,l:454;<e> f:Pool-D,l:959;Stat a:0

It seems that only entries with one pair of f:Pool-A,l:2066 is counted. When more pairs than not counted at all.

tatery · ‎03-08-2018

My previous thoughts about an issue with count is not correct. It is something wrong with this part:

rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"

running this query

your base search 
   | rex "id:(?<id>[^\;]+)"
   | rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
   | table Pool

results in table with few Pools values and empty rows, but number of Pool-XX is less than expected.

tatery · ‎03-08-2018

I was able to manage my last issue. The solution is to keep Pool and l as one value:

rex max_match=0 "f\:(?<Pool_l>[^,]+,l:\d+)"

tatery · ‎03-07-2018

Wow, you are awesome. Your query works like a charm.
Many thanks.

How to Group data by a few factors?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

May 2026 Splunk Expert Sessions: Security & Observability

Join the Conversation