Splunk Search

How obtain the sum of a multivalue field

New Member

In each of my events, I have a field named watched. The watched multifield contains the array of integers. Is it possible to get the sum of the multivalue field? Below is the json data which gets indexed into my splunk index.

I have tried the following:

1] index=netbase1index sourcetype=svsessions | rename "watched{}" as vwatch | eventstats sum(vwatch) as Total | table viewer.email Total

this returns the sum, but it is the sum of all the multivalue field, watched for all the events instead of the sum of each event.

2] index=netbase1index sourcetype=svsessions | rename "watched{}" as vwatch | makemv vwatch | eventstats sum(vwatch) as total | table viewer.email vwatch total

the total for each event is just blank.

3] index=netbase1index sourcetype=svsessions |spath output=total sum(path=watched{*}) | table viewer.email total

the total for each event is just blank.

{"playbacktype": "html5", "url": "https://nds.io/videos/a49bdeb31316e4c62c/06-analyze-a-topic", "geo": {"country": "US", "region": "NY", "city": "New York"}, "videoid": "a49bdeb31316e4c62c", "viewer": {"email": "email@email.com", "watched": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 3, 2, 2, 3, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "ip_address": "14.9.22.41"}

Tags (3)
0 Karma
1 Solution

Champion

There are a couple of things about your results that don't make sense to me, but I will try to answer regardless.

For #1, I don't understand how this works if vwatch is a multi-valued field, but I will answer anyway. If it will add them up as you suggest, then you only need to add a term to your eventstats segment that identifies the unique IDs. If there is nothing already in each event that serves as a unique ID, then the easiest thing to do is add streamstats count as count | before the eventstats pipe. The result would be

index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | eventstats sum(vwatch) as Total by count | table viewer.email Total

For #2, I don't understand why using mvexpand wouldn't work, but you still need to have a unique value like #1. Without seeing the raw data, my attempt would be as follows:

index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | mvexpand vwatch | stats sum(vwatch) as Total by viewer.email count | table viewer.email Total

For #3, I am no help.

Option #4, if you plan to do a lot of searches that need, I suggest you implement a custom SPL command like I did for this exact use case. The starting point is called mvsum (see http://jordan.broughs.net/archives/2012/06/mvsum-for-splunk-summing-multi-valued-fields-within-a-sin...). You can rip-off my implementation straight from my app Layer8Insight App For Splunk (https://splunkbase.splunk.com/app/3171/). It is battle tested. My app includes the configuration files and python code you would need.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Hi @cgong - Looks like you got a solution to your question. Please don't forget to click "Accept" below the best answer and up-vote any answers that were helpful to you. Thanks!

0 Karma

SplunkTrust
SplunkTrust

Assuming that yor watched (multivalued) field is available in all events and you want to add a field Total to each event, try like this

index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | eval sno=1 | accum sno | eventstats sum(vwatch) as Total by sno| table viewer.email Total

If you want to summarize data by viewer.email field (means if there are 2+ events are with same viewer.email value then sum all those events), then try this

index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch |stats sum(vwatch) as Total by viewer.email
0 Karma

New Member

This query worked:

index=netbase1index sourcetype=svsessions | rename "watched{}" as vwatch | eval sno=1 | accum sno |
eventstats sum(vwatch) as Total by sno| table viewer.email Total

Thank you.

0 Karma

Motivator

Yes its possible to sum your mutivalue field "watched".

If the emailId and watched fields are already extracted then try below query:

yourBaseSearch to return events containing emailId and watched
    | rex field=watched max_match=0 "(?<watchcount>[\d]+)" 
    | mvexpand watchcount
    | stats sum(watchcount) by emailId

If the fields are not extracted use below query to extract the fields emailId and watched first and then do then counting:

yourBaseSearch to return events
|rex "email\":\s\"(?<emailId>[^\"]+)"
| rex "watched\"\:\s\[(?<watched>[^\]]+)\]"
| rex field=watched max_match=0 "(?<watchcount>[\d]+)" 
| mvexpand watchcount
| stats sum(watchcount) by emailId

Contributor

While I'm not fully understanding your request either, it appears to me that you are either:

1 - Trying to get the sum of the array of numbers in the field "watched{}", which I've based off of you renaming "watched{}" as "vwatch" and applying the stats function "sum(vwatch)" as the "total".

2 - My other interpretation of your request, based off your second search where you are using "makemv", is that you are trying to gather a count of how many numbers there are in the field "watched{}"

If either one of my interpretations are correct for the array below:

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 3, 2, 2, 3, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

Then I think Splunk is actually interpreting these numbers as a string rather than numbers, in which case you need to convert the string to numbers.

If my first interpretation above was correct, try this search:

index=netbase1_index sourcetype=sv_sessions | rex field=_raw "watched\"\:\s\[(?P<watched_numbers>.[^\]]*)" | eval trim_spaces=trim(split(watched_numbers,",")) | stats count by viewer_email, trim_spaces | eval convert_string = tonumber(trim_spaces) | eval product = convert_string * count | eventstats sum(product) AS watch_total | dedup watch_total | table viewer_email, watch_total

If my second interpretation above was correct, try this search:

index=netbase1_index sourcetype=sv_sessions | rex field=_raw "watched\"\:\s\[(?P<watched_numbers>.[^\]]*)" | eval trim_spaces=trim(split(watched_numbers,",")) | stats count by viewer_email, trim_spaces | eval convert_string = tonumber(trim_spaces) | stats sum(count) AS array_total by viewer_email

*edit - you will probably need to change the field "viewer_email" to match your own.

0 Karma

Champion

There are a couple of things about your results that don't make sense to me, but I will try to answer regardless.

For #1, I don't understand how this works if vwatch is a multi-valued field, but I will answer anyway. If it will add them up as you suggest, then you only need to add a term to your eventstats segment that identifies the unique IDs. If there is nothing already in each event that serves as a unique ID, then the easiest thing to do is add streamstats count as count | before the eventstats pipe. The result would be

index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | eventstats sum(vwatch) as Total by count | table viewer.email Total

For #2, I don't understand why using mvexpand wouldn't work, but you still need to have a unique value like #1. Without seeing the raw data, my attempt would be as follows:

index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | mvexpand vwatch | stats sum(vwatch) as Total by viewer.email count | table viewer.email Total

For #3, I am no help.

Option #4, if you plan to do a lot of searches that need, I suggest you implement a custom SPL command like I did for this exact use case. The starting point is called mvsum (see http://jordan.broughs.net/archives/2012/06/mvsum-for-splunk-summing-multi-valued-fields-within-a-sin...). You can rip-off my implementation straight from my app Layer8Insight App For Splunk (https://splunkbase.splunk.com/app/3171/). It is battle tested. My app includes the configuration files and python code you would need.

View solution in original post

0 Karma

New Member

This worked:

index=netbase1index sourcetype=svsessions | rename "watched{}" as vwatch | streamstats count as count | eventstats sum(vwatch) as Total by count | table viewer.email Total

Thank you.

0 Karma

Champion

@cgong can you please accept my answer if that solved the issue for you?

0 Karma