In each of my events, I have a field named watched. The watched multifield contains the array of integers. Is it possible to get the sum of the multivalue field? Below is the json data which gets indexed into my splunk index.
I have tried the following:
1] index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | eventstats sum(vwatch) as Total | table viewer.email Total
2] index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | makemv vwatch | eventstats sum(vwatch) as total | table viewer.email vwatch total
3] index=netbase1_index sourcetype=sv_sessions |spath output=total sum(path=watched{*}) | table viewer.email total
{"playback_type": "html5", "url": "https://nds.io/videos/a49bdeb31316e4c62c/06-analyze-a-topic", "geo": {"country": "US", "region": "NY", "city": "New York"}, "video_id": "a49bdeb31316e4c62c", "viewer": {"email": "email@email.com", "watched": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 3, 2, 2, 3, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "ip_address": "14.9.22.41"}
There are a couple of things about your results that don't make sense to me, but I will try to answer regardless.
For #1, I don't understand how this works if vwatch
is a multi-valued field, but I will answer anyway. If it will add them up as you suggest, then you only need to add a term to your eventstats
segment that identifies the unique IDs. If there is nothing already in each event that serves as a unique ID, then the easiest thing to do is add streamstats count as count |
before the eventstats
pipe. The result would be
index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | eventstats sum(vwatch) as Total by count | table viewer.email Total
For #2, I don't understand why using mvexpand
wouldn't work, but you still need to have a unique value like #1. Without seeing the raw data, my attempt would be as follows:
index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | mvexpand vwatch | stats sum(vwatch) as Total by viewer.email count | table viewer.email Total
For #3, I am no help.
Option #4, if you plan to do a lot of searches that need, I suggest you implement a custom SPL command like I did for this exact use case. The starting point is called mvsum
(see http://jordan.broughs.net/archives/2012/06/mvsum-for-splunk-summing-multi-valued-fields-within-a-sin...). You can rip-off my implementation straight from my app Layer8Insight App For Splunk (https://splunkbase.splunk.com/app/3171/). It is battle tested. My app includes the configuration files and python code you would need.
Hi @cgong - Looks like you got a solution to your question. Please don't forget to click "Accept" below the best answer and up-vote any answers that were helpful to you. Thanks!
Assuming that yor watched (multivalued) field is available in all events and you want to add a field Total to each event, try like this
index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | eval sno=1 | accum sno | eventstats sum(vwatch) as Total by sno| table viewer.email Total
If you want to summarize data by viewer.email field (means if there are 2+ events are with same viewer.email value then sum all those events), then try this
index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch |stats sum(vwatch) as Total by viewer.email
This query worked:
index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | eval sno=1 | accum sno |
eventstats sum(vwatch) as Total by sno| table viewer.email Total
Thank you.
Yes its possible to sum your mutivalue field "watched".
If the emailId and watched fields are already extracted then try below query:
yourBaseSearch to return events containing emailId and watched
| rex field=watched max_match=0 "(?<watchcount>[\d]+)"
| mvexpand watchcount
| stats sum(watchcount) by emailId
If the fields are not extracted use below query to extract the fields emailId and watched first and then do then counting:
yourBaseSearch to return events
|rex "email\":\s\"(?<emailId>[^\"]+)"
| rex "watched\"\:\s\[(?<watched>[^\]]+)\]"
| rex field=watched max_match=0 "(?<watchcount>[\d]+)"
| mvexpand watchcount
| stats sum(watchcount) by emailId
While I'm not fully understanding your request either, it appears to me that you are either:
1 - Trying to get the sum of the array of numbers in the field "watched{}", which I've based off of you renaming "watched{}" as "vwatch" and applying the stats function "sum(vwatch)" as the "total".
2 - My other interpretation of your request, based off your second search where you are using "makemv", is that you are trying to gather a count of how many numbers there are in the field "watched{}"
If either one of my interpretations are correct for the array below:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 3, 2, 2, 3, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
Then I think Splunk is actually interpreting these numbers as a string rather than numbers, in which case you need to convert the string to numbers.
If my first interpretation above was correct, try this search:
index=netbase1_index sourcetype=sv_sessions | rex field=_raw "watched\"\:\s\[(?P<watched_numbers>.[^\]]*)" | eval trim_spaces=trim(split(watched_numbers,",")) | stats count by viewer_email, trim_spaces | eval convert_string = tonumber(trim_spaces) | eval product = convert_string * count | eventstats sum(product) AS watch_total | dedup watch_total | table viewer_email, watch_total
If my second interpretation above was correct, try this search:
index=netbase1_index sourcetype=sv_sessions | rex field=_raw "watched\"\:\s\[(?P<watched_numbers>.[^\]]*)" | eval trim_spaces=trim(split(watched_numbers,",")) | stats count by viewer_email, trim_spaces | eval convert_string = tonumber(trim_spaces) | stats sum(count) AS array_total by viewer_email
*edit - you will probably need to change the field "viewer_email" to match your own.
There are a couple of things about your results that don't make sense to me, but I will try to answer regardless.
For #1, I don't understand how this works if vwatch
is a multi-valued field, but I will answer anyway. If it will add them up as you suggest, then you only need to add a term to your eventstats
segment that identifies the unique IDs. If there is nothing already in each event that serves as a unique ID, then the easiest thing to do is add streamstats count as count |
before the eventstats
pipe. The result would be
index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | eventstats sum(vwatch) as Total by count | table viewer.email Total
For #2, I don't understand why using mvexpand
wouldn't work, but you still need to have a unique value like #1. Without seeing the raw data, my attempt would be as follows:
index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | mvexpand vwatch | stats sum(vwatch) as Total by viewer.email count | table viewer.email Total
For #3, I am no help.
Option #4, if you plan to do a lot of searches that need, I suggest you implement a custom SPL command like I did for this exact use case. The starting point is called mvsum
(see http://jordan.broughs.net/archives/2012/06/mvsum-for-splunk-summing-multi-valued-fields-within-a-sin...). You can rip-off my implementation straight from my app Layer8Insight App For Splunk (https://splunkbase.splunk.com/app/3171/). It is battle tested. My app includes the configuration files and python code you would need.
This worked:
index=netbase1_index sourcetype=sv_sessions | rename "watched{}" as vwatch | streamstats count as count | eventstats sum(vwatch) as Total by count | table viewer.email Total
Thank you.
@cgong can you please accept my answer if that solved the issue for you?