Solved: Re: Accumulate values for a multi value field by k...

unbelievable_ma · ‎08-26-2020

Hi,

Let's say I can get this table using some Splunk query.

id	stages
1	key1,100 key2,200 key3,300
2	key1,50 key2,150 key3,250
3	key1,150 key2,250 key3,350

Given this data I want the result, that is I want to reduce (average) over the keys.

key	avg
key1	100
key2	200
key3	300

I tried to use mvexpand for this but Splunk runs out of memory and the results get truncated. So I want something more like a reduce function that can accumulate this mv field by key. Is this possible to do through a splunk query?

Here is what I have tried:

`data_source`
    | fields id, stages{}.name as stage_name, somejson{}.duration as stage_duration
    | eval stages=mvzip(stage_name, stage_duration)
    | eval stages=mvfilter(match(stages, "key*"))
    | mvexpand stages
    | eval stages=split(stages, ",")
    | eval stage_name=mvindex(stages,0)
    | eval stage_duration=mvindex(stages,1)
    | stats avg(stage_duration) by stage_name

I want to do something more efficient than `mvexpand stages` that helps me do the reduction without blowing up memory.

unbelievable_ma · ‎08-27-2020

Unfortunately I can't do that. I don't have a lot of different keys. Around 10-12.
I ended up solving this by splitting the key manually into separate columns and then aggregating without using mvexpand. Something like:

| fields name, value
| eval stages=mvzip(name, value)
-- the sort helps here to make sure the keys appear always in the same order
| eval stages=mvsort(mvfilter(match(stages, "key-*"))
| eval key0=mvindex(stages,0)
| rex field=key0 "(?<name>.+),(?<key0>.+)"
| eval key1=mvindex(stages,0)
| rex field=key1 "(?<name>.+),(?<key1>.+)"
-- similarly for all different keys
| timechart span=1h
median(key1) as key1
median(key2) as key2
-- other keys here

This expands to exactly the same amount of data as before but no memory issues.

View solution in original post

thambisetty · ‎08-26-2020

| eventstats count

| mvexpand stages

| rex field=stages “(?<key>[^,]+),(?<value>\d+)”

| stats values(count) as count sum(value) as value by key

| eval avg=value/count

| table key avg

Note: replace double quotes in rex command with double quotes from your keyboard.

not tested also.

————————————
If this helps, give a like below.

unbelievable_ma · ‎08-26-2020

`mvexpand stages` blows up memory unfortunately.

thambisetty · ‎08-27-2020

Yes. If you are applying on raw events. Try to apply after transforming commands like top,chart, stats, timechart. This will reduce number of events.

————————————
If this helps, give a like below.

unbelievable_ma · ‎08-27-2020

Unfortunately I can't do that. I don't have a lot of different keys. Around 10-12.
I ended up solving this by splitting the key manually into separate columns and then aggregating without using mvexpand. Something like:

| fields name, value
| eval stages=mvzip(name, value)
-- the sort helps here to make sure the keys appear always in the same order
| eval stages=mvsort(mvfilter(match(stages, "key-*"))
| eval key0=mvindex(stages,0)
| rex field=key0 "(?<name>.+),(?<key0>.+)"
| eval key1=mvindex(stages,0)
| rex field=key1 "(?<name>.+),(?<key1>.+)"
-- similarly for all different keys
| timechart span=1h
median(key1) as key1
median(key2) as key2
-- other keys here

This expands to exactly the same amount of data as before but no memory issues.

Accumulate values for a multi value field by key

field extraction

fields

stats

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?