Hi,
Let's say I can get this table using some Splunk query.
id | stages |
1 | key1,100 key2,200 key3,300 |
2 | key1,50 key2,150 key3,250 |
3 | key1,150 key2,250 key3,350 |
Given this data I want the result, that is I want to reduce (average) over the keys.
key | avg |
key1 | 100 |
key2 | 200 |
key3 | 300 |
I tried to use mvexpand for this but Splunk runs out of memory and the results get truncated. So I want something more like a reduce function that can accumulate this mv field by key. Is this possible to do through a splunk query?
Here is what I have tried:
`data_source`
| fields id, stages{}.name as stage_name, somejson{}.duration as stage_duration
| eval stages=mvzip(stage_name, stage_duration)
| eval stages=mvfilter(match(stages, "key*"))
| mvexpand stages
| eval stages=split(stages, ",")
| eval stage_name=mvindex(stages,0)
| eval stage_duration=mvindex(stages,1)
| stats avg(stage_duration) by stage_name
I want to do something more efficient than `mvexpand stages` that helps me do the reduction without blowing up memory.
Unfortunately I can't do that. I don't have a lot of different keys. Around 10-12.
I ended up solving this by splitting the key manually into separate columns and then aggregating without using mvexpand. Something like:
| fields name, value
| eval stages=mvzip(name, value)
-- the sort helps here to make sure the keys appear always in the same order
| eval stages=mvsort(mvfilter(match(stages, "key-*"))
| eval key0=mvindex(stages,0)
| rex field=key0 "(?<name>.+),(?<key0>.+)"
| eval key1=mvindex(stages,0)
| rex field=key1 "(?<name>.+),(?<key1>.+)"
-- similarly for all different keys
| timechart span=1h
median(key1) as key1
median(key2) as key2
-- other keys here
This expands to exactly the same amount of data as before but no memory issues.
| eventstats count
| mvexpand stages
| rex field=stages “(?<key>[^,]+),(?<value>\d+)”
| stats values(count) as count sum(value) as value by key
| eval avg=value/count
| table key avg
Note: replace double quotes in rex command with double quotes from your keyboard.
not tested also.
`mvexpand stages` blows up memory unfortunately.
Yes. If you are applying on raw events. Try to apply after transforming commands like top,chart, stats, timechart. This will reduce number of events.
Unfortunately I can't do that. I don't have a lot of different keys. Around 10-12.
I ended up solving this by splitting the key manually into separate columns and then aggregating without using mvexpand. Something like:
| fields name, value
| eval stages=mvzip(name, value)
-- the sort helps here to make sure the keys appear always in the same order
| eval stages=mvsort(mvfilter(match(stages, "key-*"))
| eval key0=mvindex(stages,0)
| rex field=key0 "(?<name>.+),(?<key0>.+)"
| eval key1=mvindex(stages,0)
| rex field=key1 "(?<name>.+),(?<key1>.+)"
-- similarly for all different keys
| timechart span=1h
median(key1) as key1
median(key2) as key2
-- other keys here
This expands to exactly the same amount of data as before but no memory issues.