Splunk Search

how to do distinct count over a multivalue data conditionally?

Path Finder

Hi,
I have the following search which returns the avg number of "EnterPlace" actions in a session (a transaction = a session) by month.
Each "EnterPlace" action_type also has a place_id.
The problem is that we need to calculate the avg number of EnterPlace actions with unique place_id in a session (ie. if a transaction had two logs with action_type="EnterPlace" and both logs had the same value for the world_id, then the count for that transaction should be 1).
Is there a way to do that?
Really appreciate your help.

This is what I have so far, which needs to be corrected:

endlessSource | transaction user_id, pid keeporphans=f maxspan=24h maxpause=30m mvraw=t delim="," mvlist=t |
bucket _time span=30d |
convert timeformat="%y-%m-%d" ctime(_time) as YMD |
eval this_action_type= mvfilter(match(action_type, "EnterPlace")) |
eval this_action_count=mvcount(this_action_type) |
fillnull this_action_count |
stats avg(this_action_count) as avg_Action_by_session
max(this_action_count) as max_Action_by_session by YMD

the problem is that dc should be on a different field than the mvexpand and only conditionally:
the events of a transaction have different values for action_type.
so, for each transaction, only for the events that action_type=EnterPlace, I need to do dc(world_id). and then average out the dc value over all the transactions.

0 Karma
1 Solution

Splunk Employee
Splunk Employee

Hi Fere,
My understanding is that after the transaction you want to expand each multivalue of the action_type and world_id (or other fields), in order to do a distinct count of them.

The difficult being that, the fields are multivalue after the transaction, and that you want to distinguish per transaction.

Here is a proposition using :

  • an eval field for the counter action_type and world_id (action_id)
  • an eval fields to distinguish each unique transaction (unique_id based on user_id, pid and the _time of the transaction begin, before the bucketing of time)
  • a mvindex command to turn a multiline in to a single line (they are identical)
  • a mvexpand to explode each action_id into seperate events in order to count them.

see


endlessSource | eval unique_id=user_id."-".pid
| eval action_id=action_type."-".world_id
| transaction unique_id keeporphans=f maxspan=24h maxpause=30m mvraw=t delim="," mvlist=t
| eval unique_id=_time."-".mvindex(unique_id,0,0)
| bucket _time span=30d
| convert timeformat="%y-%m-%d" ctime(_time) as YMD
| mvexpand action_id
| search action_id="EnterRoom*"
| stats count dc(action_id) AS distinct by unique_id YMD

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Hi Fere,
My understanding is that after the transaction you want to expand each multivalue of the action_type and world_id (or other fields), in order to do a distinct count of them.

The difficult being that, the fields are multivalue after the transaction, and that you want to distinguish per transaction.

Here is a proposition using :

  • an eval field for the counter action_type and world_id (action_id)
  • an eval fields to distinguish each unique transaction (unique_id based on user_id, pid and the _time of the transaction begin, before the bucketing of time)
  • a mvindex command to turn a multiline in to a single line (they are identical)
  • a mvexpand to explode each action_id into seperate events in order to count them.

see


endlessSource | eval unique_id=user_id."-".pid
| eval action_id=action_type."-".world_id
| transaction unique_id keeporphans=f maxspan=24h maxpause=30m mvraw=t delim="," mvlist=t
| eval unique_id=_time."-".mvindex(unique_id,0,0)
| bucket _time span=30d
| convert timeformat="%y-%m-%d" ctime(_time) as YMD
| mvexpand action_id
| search action_id="EnterRoom*"
| stats count dc(action_id) AS distinct by unique_id YMD

View solution in original post

0 Karma