Splunk Search

mvfilter before using mvexpand to reduce memory usage

unbelievable_ma
Explorer

Hi,
I have some documents that looks like this:

 

 

{
  "document_id": "some-id",
  "status": "some-status",
  "fields": "some values",
  "stages": [
    {
      "duration": 0.031,
      "name": "my_name",
      "more_fields": "more_values",
      "array_field": [...],
    },
    ...
  ]
}

 

 

The length of the stages field can be quite large. I would like to calculate the avg or median duration for each type of stage but not for all stage types. Here is what I have initially:

 

 

data_source
    | fields status, stages{}.name as sname, stages{}.duration
    | eval stage_fields=mvzip('stages{}.name', 'stages{}.duration')
    | where job_result in ("some-status")
    | mvexpand stage_fields
    | fields stage_fields
    | rex field=stage_fields "(?<stage_name>.+),(?<stage_duration>.+)"
    | where stage_name in ("my_name", "other_name")
    | timechart span=1h median(stage_duration) as "Median Stage Duration" by stage_name
    | rename stage_name as "Stage Name"

 

 

 This obviously starts truncating results because mvexpand starts expanding into a huge number of fields and complains about memory limits. I tried to put an mvfilter before it so that it only expands those stages that I am interested in but obviously I didn't know how to use it so that ended up as a no op.

So the question is how can I make this query more efficient?
Thanks!

0 Karma

to4kawa
Ultra Champion
 
0 Karma

unbelievable_ma
Explorer

Sorry I don't get it. Could you expand on this a bit?

Thanks!

0 Karma

to4kawa
Ultra Champion
 
0 Karma

unbelievable_ma
Explorer

Well it doesn't really do what I stated in the problem.

Perhaps you could say explains this part?
> stats values(_time) as _time by stages

0 Karma

to4kawa
Ultra Champion
 
0 Karma

unbelievable_ma
Explorer

No _time is not unique because multiple values exist within the same event (hence mvexpand) hence the results are not correct.
I can try to give another example. Lets say I can get this table

idmv_field
1

key1,100

key2,200

key3,300 

2

key1,100

key2,200

key3,300 

3 

key1,100

key2,200

key3,300 

 

Given this I want the result:

keysum
key1300
key2600
key3900

 

The important part here is that the second column is an mv field. mvexpand breaks the memory usage there so I need some other way to accumulate the results. Maybe I will post this as a separate question cause this is perhaps simpler to explain.

Update: mvfilter didn't help with the memory. I found a solution to this that I added here:
https://community.splunk.com/t5/Splunk-Search/Accumulate-values-for-a-multi-value-field-by-key/m-p/5...

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...