Splunk Search
Highlighted

how to do distinct count over a multivalue data conditionally?

Path Finder

Hi,
I have the following search which returns the avg number of "EnterPlace" actions in a session (a transaction = a session) by month.
Each "EnterPlace" actiontype also has a placeid.
The problem is that we need to calculate the avg number of EnterPlace actions with unique placeid in a session (ie. if a transaction had two logs with actiontype="EnterPlace" and both logs had the same value for the world_id, then the count for that transaction should be 1).
Is there a way to do that?
Really appreciate your help.

This is what I have so far, which needs to be corrected:

endlessSource | transaction userid, pid keeporphans=f maxspan=24h maxpause=30m mvraw=t delim="," mvlist=t |
bucket _time span=30d |
convert timeformat="%y-%m-%d" ctime(
time) as YMD |
eval thisactiontype= mvfilter(match(actiontype, "EnterPlace")) |
eval this
actioncount=mvcount(thisactiontype) |
fillnull this
actioncount |
stats avg(this
actioncount) as avgActionbysession
max(thisactioncount) as maxActionby_session by YMD

the problem is that dc should be on a different field than the mvexpand and only conditionally:
the events of a transaction have different values for actiontype.
so, for each transaction, only for the events that action
type=EnterPlace, I need to do dc(world_id). and then average out the dc value over all the transactions.

0 Karma
Highlighted

Re: how to do distinct count over a multivalue data conditionally?

Splunk Employee
Splunk Employee

Hi Fere,
My understanding is that after the transaction you want to expand each multivalue of the actiontype and worldid (or other fields), in order to do a distinct count of them.

The difficult being that, the fields are multivalue after the transaction, and that you want to distinguish per transaction.

Here is a proposition using :

  • an eval field for the counter actiontype and worldid (action_id)
  • an eval fields to distinguish each unique transaction (uniqueid based on userid, pid and the _time of the transaction begin, before the bucketing of time)
  • a mvindex command to turn a multiline in to a single line (they are identical)
  • a mvexpand to explode each action_id into seperate events in order to count them.

see


endlessSource | eval uniqueid=userid."-".pid
| eval actionid=actiontype."-".worldid
| transaction unique
id keeporphans=f maxspan=24h maxpause=30m mvraw=t delim="," mvlist=t
| eval uniqueid=time."-".mvindex(uniqueid,0,0)
| bucket _time span=30d
| convert timeformat="%y-%m-%d" ctime(
time) as YMD
| mvexpand actionid
| search action
id="EnterRoom*"
| stats count dc(actionid) AS distinct by uniqueid YMD

View solution in original post

0 Karma