Re: mvfilter before using mvexpand to reduce memor...

unbelievable_ma · ‎08-25-2020

Hi,
I have some documents that looks like this:

{
  "document_id": "some-id",
  "status": "some-status",
  "fields": "some values",
  "stages": [
    {
      "duration": 0.031,
      "name": "my_name",
      "more_fields": "more_values",
      "array_field": [...],
    },
    ...
  ]
}

The length of the stages field can be quite large. I would like to calculate the avg or median duration for each type of stage but not for all stage types. Here is what I have initially:

data_source
    | fields status, stages{}.name as sname, stages{}.duration
    | eval stage_fields=mvzip('stages{}.name', 'stages{}.duration')
    | where job_result in ("some-status")
    | mvexpand stage_fields
    | fields stage_fields
    | rex field=stage_fields "(?<stage_name>.+),(?<stage_duration>.+)"
    | where stage_name in ("my_name", "other_name")
    | timechart span=1h median(stage_duration) as "Median Stage Duration" by stage_name
    | rename stage_name as "Stage Name"

This obviously starts truncating results because mvexpand starts expanding into a huge number of fields and complains about memory limits. I tried to put an mvfilter before it so that it only expands those stages that I am interested in but obviously I didn't know how to use it so that ended up as a no op.

So the question is how can I make this query more efficient?
Thanks!

to4kawa · ‎08-25-2020

unbelievable_ma · ‎08-25-2020

Sorry I don't get it. Could you expand on this a bit?

Thanks!

to4kawa · ‎08-26-2020

unbelievable_ma · ‎08-26-2020

Well it doesn't really do what I stated in the problem.

Perhaps you could say explains this part?
> stats values(_time) as _time by stages

to4kawa · ‎08-26-2020

unbelievable_ma · ‎08-26-2020

No _time is not unique because multiple values exist within the same event (hence mvexpand) hence the results are not correct.
I can try to give another example. Lets say I can get this table

id	mv_field
1	key1,100 key2,200 key3,300
2	key1,100 key2,200 key3,300
3	key1,100 key2,200 key3,300

Given this I want the result:

key	sum
key1	300
key2	600
key3	900

The important part here is that the second column is an mv field. mvexpand breaks the memory usage there so I need some other way to accumulate the results. Maybe I will post this as a separate question cause this is perhaps simpler to explain.

Update: mvfilter didn't help with the memory. I found a solution to this that I added here:
https://community.splunk.com/t5/Splunk-Search/Accumulate-values-for-a-multi-value-field-by-key/m-p/5...

mvfilter before using mvexpand to reduce memory usage

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!