mvfilter before using mvexpand to reduce memory us...

unbelievable_ma · ‎08-25-2020

Hi,
I have some documents that looks like this:

{
  "document_id": "some-id",
  "status": "some-status",
  "fields": "some values",
  "stages": [
    {
      "duration": 0.031,
      "name": "my_name",
      "more_fields": "more_values",
      "array_field": [...],
    },
    ...
  ]
}

The length of the stages field can be quite large. I would like to calculate the avg or median duration for each type of stage but not for all stage types. Here is what I have initially:

data_source
    | fields status, stages{}.name as sname, stages{}.duration
    | eval stage_fields=mvzip('stages{}.name', 'stages{}.duration')
    | where job_result in ("some-status")
    | mvexpand stage_fields
    | fields stage_fields
    | rex field=stage_fields "(?<stage_name>.+),(?<stage_duration>.+)"
    | where stage_name in ("my_name", "other_name")
    | timechart span=1h median(stage_duration) as "Median Stage Duration" by stage_name
    | rename stage_name as "Stage Name"

This obviously starts truncating results because mvexpand starts expanding into a huge number of fields and complains about memory limits. I tried to put an mvfilter before it so that it only expands those stages that I am interested in but obviously I didn't know how to use it so that ended up as a no op.

So the question is how can I make this query more efficient?
Thanks!

to4kawa · ‎08-25-2020

unbelievable_ma · ‎08-25-2020

Sorry I don't get it. Could you expand on this a bit?

Thanks!

to4kawa · ‎08-26-2020

unbelievable_ma · ‎08-26-2020

Well it doesn't really do what I stated in the problem.

Perhaps you could say explains this part?
> stats values(_time) as _time by stages

to4kawa · ‎08-26-2020

unbelievable_ma · ‎08-26-2020

No _time is not unique because multiple values exist within the same event (hence mvexpand) hence the results are not correct.
I can try to give another example. Lets say I can get this table

id	mv_field
1	key1,100 key2,200 key3,300
2	key1,100 key2,200 key3,300
3	key1,100 key2,200 key3,300

Given this I want the result:

key	sum
key1	300
key2	600
key3	900

The important part here is that the second column is an mv field. mvexpand breaks the memory usage there so I need some other way to accumulate the results. Maybe I will post this as a separate question cause this is perhaps simpler to explain.

Update: mvfilter didn't help with the memory. I found a solution to this that I added here:
https://community.splunk.com/t5/Splunk-Search/Accumulate-values-for-a-multi-value-field-by-key/m-p/5...

mvfilter before using mvexpand to reduce memory usage

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation