Splunk Search
Highlighted

Can I get a count of distinct values in multivalue field?

Splunk Employee
Splunk Employee

What I'm looking for is a hybrid of the stats list() and values() functions. First, I'd like the list of unique values for a multivalue field, then alongside each unique value, I'd like the count of occurrences of that value. Is this possible?

Maybe this is better illustrated through an example.

Given a set of events like this:

03/12/2013 15:55:00 id=foo abc=123,123,123
03/12/2013 15:56:00 id=bar abc=123,456,789,789

I can get tables like these with the list() and values() functions, respectively:

  id     abc               id     abc
----------------         ----------------
 foo     123              foo     123
         123              bar     123
         123                      456
 bar     123                      789
         456
         789
         789

But what I really want is this:

  id     abc    
----------------
 foo     123 (3)
 bar     123 (1)
         456 (1)
         789 (2)

I believe this to be possible... for a search query superhero (which I am not). 🙂

Tags (1)
Highlighted

Re: Can I get a count of distinct values in multivalue field?

Engager

I tried doing this yesterday, had some trouble. You definitely want to use values(). I know we covered this in architect bootcamp training, but I forget how the instructor did it. You can always do a calculated field and convert the number to a string, but if you want a true column of (NUMERIC) counts, that's where i got lost...

i'll wait patiently for that answer 🙂

0 Karma
Highlighted

Re: Can I get a count of distinct values in multivalue field?

Splunk Employee
Splunk Employee

I was able to get the information desired, but not really in the clean format provided by the values() or list() functions using this approach:

... | stats list(abc) as tokens by id | mvexpand tokens | stats count by id,tokens | mvcombine tokens

id tokens count


bar 123 1
456
bar 789 2
foo 123 3

The output is a table of id, tokens and count, grouped by count. This technically answers the question, but not in a user-friendly format. 🙂

0 Karma
Highlighted

Re: Can I get a count of distinct values in multivalue field?

Influencer

Try this :

 ... 
| table id abc def ghi
| untable id field value 
| makemv delim="," value
| mvexpand value
| stats count by id field value
| eval pair=value." (".count.")"
| stats list(pair) as values by id field
| xyseries id field values

Given :

03/12/2013 15:55:00 id=foo abc=123,123,123 def=345,567,678,890,890
03/12/2013 15:56:00 id=bar abc=123,456,789,789 def=345,345,345,789,789

alt text

View solution in original post

Highlighted

Re: Can I get a count of distinct values in multivalue field?

Splunk Employee
Splunk Employee

You are brilliant! Thank you, this is exactly what we are looking for. 🙂

0 Karma
Highlighted

Re: Can I get a count of distinct values in multivalue field?

Influencer

shucks.
For some reason I couldnt get this to work -
... | extract auto=f pairdelim=" " kvdelim="="

Might just be me though.

0 Karma
Highlighted

Re: Can I get a count of distinct values in multivalue field?

Explorer

For anyone who wants to verify (or learn from) this query. You can do source="/path/to/the/log"
| rex field=_raw "abc=(?<abc>\S+) def=(?<def>\S+)"
to extract the abc and def fields properly (otherwise Splunk will only extract the first value of the comma-delimited list).

0 Karma
Highlighted

Re: Can I get a count of distinct values in multivalue field?

Explorer

Try this method.

This replicates your data

| makeresults
| eval data="foo;123,123,123:bar;123,456,789,789"
| makemv data delim=":" 
| mvexpand data 
| makemv data delim=";" 
| eval id=mvindex(data,0), abc=mvindex(data,1)
| makemv abc delim="," 
| table id, abc

This gets the results you're looking for. if you already have the data in separate events, only the last two lines are needed.

| makeresults
| eval data="foo;123,123,123:bar;123,456,789,789"
| makemv data delim=":" 
| mvexpand data 
| makemv data delim=";" 
| eval id=mvindex(data,0), abc=mvindex(data,1)
| makemv abc delim="," 
| stats count(eval(mvcount(abc))) as count by id, abc
0 Karma