topic Re: mvfilter before using mvexpand to reduce memory usage in Splunk Search

mvfilter before using mvexpand to reduce memory usage

unbelievable_ma — Tue, 25 Aug 2020 21:31:19 GMT

Hi,
I have some documents that looks like this:

{ "document_id": "some-id", "status": "some-status", "fields": "some values", "stages": [ { "duration": 0.031, "name": "my_name", "more_fields": "more_values", "array_field": [...], }, ... ] }

The length of the stages field can be quite large. I would like to calculate the avg or median duration for each type of stage but not for all stage types. Here is what I have initially:

data_source | fields status, stages{}.name as sname, stages{}.duration | eval stage_fields=mvzip('stages{}.name', 'stages{}.duration') | where job_result in ("some-status") | mvexpand stage_fields | fields stage_fields | rex field=stage_fields "(?<stage_name>.+),(?<stage_duration>.+)" | where stage_name in ("my_name", "other_name") | timechart span=1h median(stage_duration) as "Median Stage Duration" by stage_name | rename stage_name as "Stage Name"

This obviously starts truncating results because mvexpand starts expanding into a huge number of fields and complains about memory limits. I tried to put an mvfilter before it so that it only expands those stages that I am interested in but obviously I didn't know how to use it so that ended up as a no op.

So the question is how can I make this query more efficient?
Thanks!

Re: mvfilter before using mvexpand to reduce memory usage

to4kawa — Wed, 26 Aug 2020 12:07:32 GMT

Re: mvfilter before using mvexpand to reduce memory usage

unbelievable_ma — Wed, 26 Aug 2020 06:52:58 GMT

Sorry I don't get it. Could you expand on this a bit?

Thanks!

Re: mvfilter before using mvexpand to reduce memory usage

to4kawa — Wed, 26 Aug 2020 12:07:47 GMT

Re: mvfilter before using mvexpand to reduce memory usage

unbelievable_ma — Wed, 26 Aug 2020 11:44:10 GMT

Well it doesn't really do what I stated in the problem.

Perhaps you could say explains this part?
> stats values(_time) as _time by stages

Re: mvfilter before using mvexpand to reduce memory usage

to4kawa — Wed, 26 Aug 2020 12:08:01 GMT

Re: mvfilter before using mvexpand to reduce memory usage

unbelievable_ma — Thu, 27 Aug 2020 20:45:48 GMT

No _time is not unique because multiple values exist within the same event (hence mvexpand) hence the results are not correct.
I can try to give another example. Lets say I can get this table

id	mv_field
1	key1,100 key2,200 key3,300
2	key1,100 key2,200 key3,300
3	key1,100 key2,200 key3,300

Given this I want the result:

key	sum
key1	300
key2	600
key3	900

The important part here is that the second column is an mv field. mvexpand breaks the memory usage there so I need some other way to accumulate the results. Maybe I will post this as a separate question cause this is perhaps simpler to explain.

Update: mvfilter didn't help with the memory. I found a solution to this that I added here:
https://community.splunk.com/t5/Splunk-Search/Accumulate-values-for-a-multi-value-field-by-key/m-p/516577#M145195