Splunk Search

how to do distinct count over a multivalue data conditionally?

fere
Path Finder

Hi,
I have the following search which returns the avg number of "EnterPlace" actions in a session (a transaction = a session) by month.
Each "EnterPlace" action_type also has a place_id.
The problem is that we need to calculate the avg number of EnterPlace actions with unique place_id in a session (ie. if a transaction had two logs with action_type="EnterPlace" and both logs had the same value for the world_id, then the count for that transaction should be 1).
Is there a way to do that?
Really appreciate your help.

This is what I have so far, which needs to be corrected:

endlessSource | transaction user_id, pid keeporphans=f maxspan=24h maxpause=30m mvraw=t delim="," mvlist=t |
bucket _time span=30d |
convert timeformat="%y-%m-%d" ctime(_time) as YMD |
eval this_action_type= mvfilter(match(action_type, "EnterPlace")) |
eval this_action_count=mvcount(this_action_type) |
fillnull this_action_count |
stats avg(this_action_count) as avg_Action_by_session
max(this_action_count) as max_Action_by_session by YMD

the problem is that dc should be on a different field than the mvexpand and only conditionally:
the events of a transaction have different values for action_type.
so, for each transaction, only for the events that action_type=EnterPlace, I need to do dc(world_id). and then average out the dc value over all the transactions.

0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

Hi Fere,
My understanding is that after the transaction you want to expand each multivalue of the action_type and world_id (or other fields), in order to do a distinct count of them.

The difficult being that, the fields are multivalue after the transaction, and that you want to distinguish per transaction.

Here is a proposition using :

  • an eval field for the counter action_type and world_id (action_id)
  • an eval fields to distinguish each unique transaction (unique_id based on user_id, pid and the _time of the transaction begin, before the bucketing of time)
  • a mvindex command to turn a multiline in to a single line (they are identical)
  • a mvexpand to explode each action_id into seperate events in order to count them.

see


endlessSource | eval unique_id=user_id."-".pid
| eval action_id=action_type."-".world_id
| transaction unique_id keeporphans=f maxspan=24h maxpause=30m mvraw=t delim="," mvlist=t
| eval unique_id=_time."-".mvindex(unique_id,0,0)
| bucket _time span=30d
| convert timeformat="%y-%m-%d" ctime(_time) as YMD
| mvexpand action_id
| search action_id="EnterRoom*"
| stats count dc(action_id) AS distinct by unique_id YMD

View solution in original post

0 Karma

yannK
Splunk Employee
Splunk Employee

Hi Fere,
My understanding is that after the transaction you want to expand each multivalue of the action_type and world_id (or other fields), in order to do a distinct count of them.

The difficult being that, the fields are multivalue after the transaction, and that you want to distinguish per transaction.

Here is a proposition using :

  • an eval field for the counter action_type and world_id (action_id)
  • an eval fields to distinguish each unique transaction (unique_id based on user_id, pid and the _time of the transaction begin, before the bucketing of time)
  • a mvindex command to turn a multiline in to a single line (they are identical)
  • a mvexpand to explode each action_id into seperate events in order to count them.

see


endlessSource | eval unique_id=user_id."-".pid
| eval action_id=action_type."-".world_id
| transaction unique_id keeporphans=f maxspan=24h maxpause=30m mvraw=t delim="," mvlist=t
| eval unique_id=_time."-".mvindex(unique_id,0,0)
| bucket _time span=30d
| convert timeformat="%y-%m-%d" ctime(_time) as YMD
| mvexpand action_id
| search action_id="EnterRoom*"
| stats count dc(action_id) AS distinct by unique_id YMD

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...