What I'm looking for is a hybrid of the stats list() and values() functions. First, I'd like the list of unique values for a multivalue field, then alongside each unique value, I'd like the count of occurrences of that value. Is this possible?
Maybe this is better illustrated through an example.
Given a set of events like this:
03/12/2013 15:55:00 id=foo abc=123,123,123 03/12/2013 15:56:00 id=bar abc=123,456,789,789
I can get tables like these with the list() and values() functions, respectively:
id abc id abc ---------------- ---------------- foo 123 foo 123 123 bar 123 123 456 bar 123 789 456 789 789
But what I really want is this:
id abc ---------------- foo 123 (3) bar 123 (1) 456 (1) 789 (2)
I believe this to be possible... for a search query superhero (which I am not). 🙂
I tried doing this yesterday, had some trouble. You definitely want to use values(). I know we covered this in architect bootcamp training, but I forget how the instructor did it. You can always do a calculated field and convert the number to a string, but if you want a true column of (NUMERIC) counts, that's where i got lost...
i'll wait patiently for that answer 🙂
I was able to get the information desired, but not really in the clean format provided by the values() or list() functions using this approach:
... | stats list(abc) as tokens by id | mvexpand tokens | stats count by id,tokens | mvcombine tokens
id tokens count
bar 123 1
bar 789 2
foo 123 3
The output is a table of id, tokens and count, grouped by count. This technically answers the question, but not in a user-friendly format. 🙂
Try this :
... | table id abc def ghi | untable id field value | makemv delim="," value | mvexpand value | stats count by id field value | eval pair=value." (".count.")" | stats list(pair) as values by id field | xyseries id field values
03/12/2013 15:55:00 id=foo abc=123,123,123 def=345,567,678,890,890 03/12/2013 15:56:00 id=bar abc=123,456,789,789 def=345,345,345,789,789
For some reason I couldnt get this to work -
... | extract auto=f pairdelim=" " kvdelim="="
Might just be me though.
For anyone who wants to verify (or learn from) this query. You can do
source="/path/to/the/log" to extract the
| rex field=_raw "abc=(?<abc>\S+) def=(?<def>\S+)"
def fields properly (otherwise Splunk will only extract the first value of the comma-delimited list).
Try this method.
This replicates your data
| makeresults | eval data="foo;123,123,123:bar;123,456,789,789" | makemv data delim=":" | mvexpand data | makemv data delim=";" | eval id=mvindex(data,0), abc=mvindex(data,1) | makemv abc delim="," | table id, abc
This gets the results you're looking for. if you already have the data in separate events, only the last two lines are needed.
| makeresults | eval data="foo;123,123,123:bar;123,456,789,789" | makemv data delim=":" | mvexpand data | makemv data delim=";" | eval id=mvindex(data,0), abc=mvindex(data,1) | makemv abc delim="," | stats count(eval(mvcount(abc))) as count by id, abc