Alternatives to mvexpand for decreasing memory usa...

arosenwinkel · ‎11-24-2020

Hello! I have some JSON events that each look something like this:

{
  "id": 12345,
  "steps": [
    {
      "stepName": "A",
      "stepDuration": 0.5
    },
    {
      "stepName": "B",
      "stepDuration": 0.17
    }
  ]
}

My existing searches are set up to do a mvexpand() based on the steps field such that each step becomes its own event which I am able to manipulate. This works great for small numbers of events, but when I am processing thousands of events with 100+ steps each, I am quickly running into the memory limitations imposed on the mvexpand function by default.

Is there an alternative function that I am missing that I can use to compute summary statistics such as "the average duration of step A is X.XX" and "X% of events hit step B"?

If not, is there a better way to structure the events themselves to support this? My constraint is that I need to allow for arbitrary numbers of steps occurring in an arbitrary order that needs to be preserved.

Thanks in advance!

arosenwinkel · ‎11-25-2020

So I feel like an idiot - my solution ended up being as simple as adding a

| fields x | fields - _raw

prior to the mvexpand x....

Appreciate the help, though, @ITWhisperer ! I'll keep tinkering with your solution because it is very weird that it only was grabbing some of the steps.

ITWhisperer · ‎11-24-2020

A bit convoluted and I don't know if it solves the memory issue, but try something like this:

| makeresults | eval _raw="{
  \"id\": 12345,
  \"steps\": [
    {
      \"stepName\": \"A\",
      \"stepDuration\": 0.5
    },
    {
      \"stepName\": \"B\",
      \"stepDuration\": 0.17
    }
  ]
}|{
  \"id\": 23456,
  \"steps\": [
    {
      \"stepName\": \"A\",
      \"stepDuration\": 0.7
    },
    {
      \"stepName\": \"C\",
      \"stepDuration\": 0.17
    }
  ]
}|{
  \"id\": 34567,
  \"steps\": [
    {
      \"stepName\": \"A\",
      \"stepDuration\": 0.9
    },
    {
      \"stepName\": \"B\",
      \"stepDuration\": 0.15
    },
    {
      \"stepName\": \"C\",
      \"stepDuration\": 0.19
    }
  ]
}"
| eval events=split(_raw,"|")
| mvexpand events
| eval _raw=events
| fields - _time events
| spath steps{} output=names
| fields - _raw
| eval durations=names
| rex field=names mode=sed "s/[\S\s]+\"stepName\":\s\"([^\"]+)[\S\s]+/\1/g"
| rex field=durations mode=sed "s/[\S\s]+\"stepDuration\":\s([\d\.]+)[\S\s]+/\1/g"
| streamstats count as row
| eval steps=mvcount(names)
| streamstats sum(steps) as toprow
| eval maxrow=toprow
| makecontinuous toprow
| reverse
| filldown
| eval toprow=if(row=1,1,toprow)
| makecontinuous toprow
| filldown
| eval names=mvindex(names,maxrow-toprow)
| eval durations=mvindex(durations,maxrow-toprow)
| fields - maxrow toprow row steps
| stats avg(durations) as average_duration count by names

arosenwinkel · ‎11-25-2020

Very cool - I will give this a try. My understanding is that this is basically doing the dirty work of mvexpand, but in a way that Splunk can hopefully do without blowing up every event at the same time?

Thanks!

ITWhisperer · ‎11-25-2020

Essentially what it is doing is working out how many rows are required by each multi-valued set, then adding additional empty rows. The order is then reversed so that filldown will copy the missing values into each row. This doesn't work properly for the first row if it has more than 1 multi-value, so it sets the row number to 1 and adds the missing rows and fills them in. Then, because each row has all the multi values from the original row, it select just one of the values for each row. The final part is an example stats across the expanded set of events.

I don't know if mvexpand works internally like this (I doubt it), but this seems to simulate the effect of mvexpand, hopefully in a less memory hungry manner.

I would be interested in your experience as I have not tried this at scale.

arosenwinkel · ‎11-25-2020

So I think this answer is verrrrry close - it is reporting the correct average durations (without triggering any memory usage warnings), but only for three of the steps!

Looking at the search it is very unclear why this would be the case.... I'll keep playing around with it because there has to be something dumb that I'm doing

Alternatives to mvexpand for decreasing memory usage

field extraction

Splunk Search APIを使えば調査過程が残せます

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

Alternatives to mvexpand for decreasing memory usage

field extraction

Splunk Search APIを使えば調査過程が残せます

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!