Splunk Search

How to Group data by a few factors?

tatery
Engager

Hello,

I need to prepare statistics of some events occurrences and this is my data in splunk:

07-03-18;11:55:14;id:2222222;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;<c> f:Pool-C,l:959;<d> f:Pool-C,l:454;<e> f:Pool-D,l:959;Stat a:0
07-03-18;11:55:14;id:3333333;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;Stat a:10
07-03-18;11:56:14;id:2222222;Stat a:0
07-03-18;11:56:14;id:2222222;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;Stat a:10
07-03-18;11:56:14;id:3333333;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;Stat a:10

Now I'd like to display how many times f:Pool-X (where Pool-X could be any string) with the same value of "l:" (for example l:2066) occurs for each node represented by "id".
As you can see for given node there may be few f:XXXX values (in different order) or non in one entry. There is a lot of entries for the same nodes.

This is what I'd like to achieve:

alt text

So far I have no idea how to do this. Maybe some of you could help with this not trivial issue.

0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try (assuming no field extraction is done, if it has been, you can ignore the rex commands)

your base search 
| rex "id:(?<id>[^\;]+)"
| rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
| stats count by id Pool l
| stats list(count) as count list(Pool) as Pool list(l) as l by id

View solution in original post

somesoni2
Revered Legend

Give this a try (assuming no field extraction is done, if it has been, you can ignore the rex commands)

your base search 
| rex "id:(?<id>[^\;]+)"
| rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
| stats count by id Pool l
| stats list(count) as count list(Pool) as Pool list(l) as l by id

tatery
Engager

Hi somesoni2, your query is perfect 🙂 However now I see that this is not exactly what I need. Could you please help again and reorganize this query to group data by the same Pool and l then calculate how many ids belongs to this set. Then display ids with occurrences counter. Example:

Pool-AA       2066     25       3333333(20), 2222222(5)
               245     10       3333333(4), 4444444(5), 22222222(1)

where:
- Pool-AA is Pool
- 2066 and 245 are l
- 25 and 10 are total counter of pair Pool and l
- 3333333(20), 2222222(5) - list of ids with number of occurrences

This query is pretty close to the result I'd like to achieve:

stats count by Pool l id
| stats list(count) as count list(id) as ID list(l) as l by Pool

I'm totally newbie in splunk so any help or advice are very welcome.
Many thanks in advance. I really appreciate your help,

0 Karma

somesoni2
Revered Legend

Try this (first 4 lines are same as previous version, rest are the code to format the output per your need)

your base search 
 | rex "id:(?<id>[^\;]+)"
 | rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
 | stats count by id Pool l
 | eval id=id."(".count.")"
 | stats sum(count) as count list(id) as id by Pool l delim="," | nomv id
 | stats list(l) as l list(count) as count list(id) as id by Pool

You can learn more about stats command and available options by using this Splunk documentation page: http://docs.splunk.com/Documentation/Splunk/7.0.2/SearchReference/Stats. You can see other command on left side navigation tree. To understand the commands/steps better, add one part at a time to see how it's changing the output.

0 Karma

tatery
Engager

Do you know if splunk has some limitation for number of displayed data? I see that counter for pair Pool:l shows for example 259 ids but there is 100 ids in id column. Is it expected result? For sure, I took into consideration values from brackets.

0 Karma

somesoni2
Revered Legend

The count columns gives total count (e.g. 25 in your example dat) and id column will have all unique ids (e.g 2 in your example data). You're saying you have more unique ids and they are not shown?

0 Karma

tatery
Engager

Yes, I have more unique ids than they are shown. However, I think that this make sense because how to display thousands of data? I didn't mention that I use splunk web search tool not API.

0 Karma

somesoni2
Revered Legend

The multivalued command do have limitation of memory usage (see limits.conf specification to more details) which may cause truncation. However, showing 100's of entries in single result row may not be readable anyways. See if you're ok with format of the result with this query.

your base search 
  | rex "id:(?<id>[^\;]+)"
  | rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
  | stats count by id Pool l
  | eval id=id."(".count.")"
  | stats sum(count) as count list(id) as id by Pool l delim="," | nomv id
0 Karma

tatery
Engager

Yes, this query is also nice. When you add extra space to delim (e.g delim=", ") then ids are displayed as multi-lines what is really nice. Thanks again for your help.

0 Karma

tatery
Engager

Unfortunately count is not working as expected. For the following input data it shows that pair Pool-A,l:2066 for id:2222222 occurs only once:

07-03-18;11:55:14;id:2222222;<a> f:Pool-A,l:2066
07-03-18;11:55:14;id:2222222;<a> f:Pool-A,l:2066;<b> f:Pool-B,l:2066;<c> f:Pool-C,l:959;<d> f:Pool-C,l:454;<e> f:Pool-D,l:959;Stat a:0

It seems that only entries with one pair of f:Pool-A,l:2066 is counted. When more pairs than not counted at all.

0 Karma

tatery
Engager

My previous thoughts about an issue with count is not correct. It is something wrong with this part:

rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"

running this query

your base search 
   | rex "id:(?<id>[^\;]+)"
   | rex max_match=0 "f\:(?<Pool>[^,]+),(?<l>[^\;]+)"
   | table Pool

results in table with few Pools values and empty rows, but number of Pool-XX is less than expected.

0 Karma

tatery
Engager

I was able to manage my last issue. The solution is to keep Pool and l as one value:

rex max_match=0 "f\:(?<Pool_l>[^,]+,l:\d+)"
0 Karma

tatery
Engager

Wow, you are awesome. Your query works like a charm.
Many thanks.

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...