Splunk Search

mvfilter before using mvexpand to reduce memory usage

unbelievable_ma
Explorer

Hi,
I have some documents that looks like this:

 

 

{
  "document_id": "some-id",
  "status": "some-status",
  "fields": "some values",
  "stages": [
    {
      "duration": 0.031,
      "name": "my_name",
      "more_fields": "more_values",
      "array_field": [...],
    },
    ...
  ]
}

 

 

The length of the stages field can be quite large. I would like to calculate the avg or median duration for each type of stage but not for all stage types. Here is what I have initially:

 

 

data_source
    | fields status, stages{}.name as sname, stages{}.duration
    | eval stage_fields=mvzip('stages{}.name', 'stages{}.duration')
    | where job_result in ("some-status")
    | mvexpand stage_fields
    | fields stage_fields
    | rex field=stage_fields "(?<stage_name>.+),(?<stage_duration>.+)"
    | where stage_name in ("my_name", "other_name")
    | timechart span=1h median(stage_duration) as "Median Stage Duration" by stage_name
    | rename stage_name as "Stage Name"

 

 

 This obviously starts truncating results because mvexpand starts expanding into a huge number of fields and complains about memory limits. I tried to put an mvfilter before it so that it only expands those stages that I am interested in but obviously I didn't know how to use it so that ended up as a no op.

So the question is how can I make this query more efficient?
Thanks!

0 Karma

to4kawa
Ultra Champion
 
0 Karma

unbelievable_ma
Explorer

Sorry I don't get it. Could you expand on this a bit?

Thanks!

0 Karma

to4kawa
Ultra Champion
 
0 Karma

unbelievable_ma
Explorer

Well it doesn't really do what I stated in the problem.

Perhaps you could say explains this part?
> stats values(_time) as _time by stages

0 Karma

to4kawa
Ultra Champion
 
0 Karma

unbelievable_ma
Explorer

No _time is not unique because multiple values exist within the same event (hence mvexpand) hence the results are not correct.
I can try to give another example. Lets say I can get this table

idmv_field
1

key1,100

key2,200

key3,300 

2

key1,100

key2,200

key3,300 

3 

key1,100

key2,200

key3,300 

 

Given this I want the result:

keysum
key1300
key2600
key3900

 

The important part here is that the second column is an mv field. mvexpand breaks the memory usage there so I need some other way to accumulate the results. Maybe I will post this as a separate question cause this is perhaps simpler to explain.

Update: mvfilter didn't help with the memory. I found a solution to this that I added here:
https://community.splunk.com/t5/Splunk-Search/Accumulate-values-for-a-multi-value-field-by-key/m-p/5...

0 Karma
Get Updates on the Splunk Community!

Unlock New Opportunities with Splunk Education: Explore Our Latest Courses!

At Splunk Education, we’re dedicated to providing top-tier learning experiences that cater to every skill ...

Technical Workshop Series: Splunk Data Management and SPL2 | Register here!

Hey, Splunk Community! Ready to take your data management skills to the next level? Join us for a 3-part ...

Spotting Financial Fraud in the Haystack: A Guide to Behavioral Analytics with Splunk

In today's digital financial ecosystem, security teams face an unprecedented challenge. The sheer volume of ...