I need the ability to dedup a multi-value field on a per event basis. Something like values() but limited to one event at a time. The ordering within the mv doesn't matter to me, just that there aren't duplicates. Any help is greatly appreciated.
My search:
host=test* | transaction Customer maxspan=3m | eval logSplit = split(_raw,",") | eval eventSplit = mvfilter(match(logSplit, "^[E|e]vent-")) | table eventSplit
Normal output:
event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-001 = date:02/13/2013 12:49:20 -0500|result:log_success
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1
Preferred output:
event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-001 = date:02/13/2013 12:49:20 -0500|result:log_success
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1
You could make use of the regular dedup like this:
... | streamstats count | mvexpand eventSplit | dedup count eventSplit | mvcombine eventSplit | fields - count
I know this is an old question, but I stumbled upon this while trying to do the same thing, and there is now a much cleaner solution:
eval mvfield=mvdedup(mvfield)
Exactly what I was looking for.
Love this community.
I ran into this need today and stumbled across this post...
It's worth noting for anyone else who finds this post while trying to figure out how to do this that <code>mvdedup</code> was only introduced in 6.2.0.
Another idea is to use stats values()
, but do a weird trick to make it calculate unique values only within each row.
| streamstats count as row_number | stats values(mvField) as mvField by row_number | fields - row_number
You could make use of the regular dedup like this:
... | streamstats count | mvexpand eventSplit | dedup count eventSplit | mvcombine eventSplit | fields - count
Thanks to both of you as these both worked to a certain degree. The stats weird trick did some strangeness to the output so I ended up using the mvexpand/mvcombine approach along with eventstats.
Much appreciated!