What I'm looking for is a hybrid of the stats list() and values() functions. First, I'd like the list of unique values for a multivalue field, then alongside each unique value, I'd like the count of occurrences of that value. Is this possible?
Maybe this is better illustrated through an example.
Given a set of events like this:
03/12/2013 15:55:00 id=foo abc=123,123,123
03/12/2013 15:56:00 id=bar abc=123,456,789,789
I can get tables like these with the list() and values() functions, respectively:
id abc id abc
---------------- ----------------
foo 123 foo 123
123 bar 123
123 456
bar 123 789
456
789
789
But what I really want is this:
id abc
----------------
foo 123 (3)
bar 123 (1)
456 (1)
789 (2)
I believe this to be possible... for a search query superhero (which I am not). 🙂
Try this :
...
| table id abc def ghi
| untable id field value
| makemv delim="," value
| mvexpand value
| stats count by id field value
| eval pair=value." (".count.")"
| stats list(pair) as values by id field
| xyseries id field values
Given :
03/12/2013 15:55:00 id=foo abc=123,123,123 def=345,567,678,890,890
03/12/2013 15:56:00 id=bar abc=123,456,789,789 def=345,345,345,789,789
Try this method.
This replicates your data
| makeresults
| eval data="foo;123,123,123:bar;123,456,789,789"
| makemv data delim=":"
| mvexpand data
| makemv data delim=";"
| eval id=mvindex(data,0), abc=mvindex(data,1)
| makemv abc delim=","
| table id, abc
This gets the results you're looking for. if you already have the data in separate events, only the last two lines are needed.
| makeresults
| eval data="foo;123,123,123:bar;123,456,789,789"
| makemv data delim=":"
| mvexpand data
| makemv data delim=";"
| eval id=mvindex(data,0), abc=mvindex(data,1)
| makemv abc delim=","
| stats count(eval(mvcount(abc))) as count by id, abc
Try this :
...
| table id abc def ghi
| untable id field value
| makemv delim="," value
| mvexpand value
| stats count by id field value
| eval pair=value." (".count.")"
| stats list(pair) as values by id field
| xyseries id field values
Given :
03/12/2013 15:55:00 id=foo abc=123,123,123 def=345,567,678,890,890
03/12/2013 15:56:00 id=bar abc=123,456,789,789 def=345,345,345,789,789
For anyone who wants to verify (or learn from) this query. You can do source="/path/to/the/log"
to extract the
| rex field=_raw "abc=(?<abc>\S+) def=(?<def>\S+)"abc
and def
fields properly (otherwise Splunk will only extract the first value of the comma-delimited list).
shucks.
For some reason I couldnt get this to work -
... | extract auto=f pairdelim=" " kvdelim="="
Might just be me though.
You are brilliant! Thank you, this is exactly what we are looking for. 🙂
I was able to get the information desired, but not really in the clean format provided by the values() or list() functions using this approach:
... | stats list(abc) as tokens by id | mvexpand tokens | stats count by id,tokens | mvcombine tokens
id tokens count
bar 123 1
456
bar 789 2
foo 123 3
The output is a table of id, tokens and count, grouped by count. This technically answers the question, but not in a user-friendly format. 🙂
I tried doing this yesterday, had some trouble. You definitely want to use values(). I know we covered this in architect bootcamp training, but I forget how the instructor did it. You can always do a calculated field and convert the number to a string, but if you want a true column of (NUMERIC) counts, that's where i got lost...
i'll wait patiently for that answer 🙂