Hi,
I have some documents that looks like this:
{
"document_id": "some-id",
"status": "some-status",
"fields": "some values",
"stages": [
{
"duration": 0.031,
"name": "my_name",
"more_fields": "more_values",
"array_field": [...],
},
...
]
}
The length of the stages field can be quite large. I would like to calculate the avg or median duration for each type of stage but not for all stage types. Here is what I have initially:
data_source
| fields status, stages{}.name as sname, stages{}.duration
| eval stage_fields=mvzip('stages{}.name', 'stages{}.duration')
| where job_result in ("some-status")
| mvexpand stage_fields
| fields stage_fields
| rex field=stage_fields "(?<stage_name>.+),(?<stage_duration>.+)"
| where stage_name in ("my_name", "other_name")
| timechart span=1h median(stage_duration) as "Median Stage Duration" by stage_name
| rename stage_name as "Stage Name"
This obviously starts truncating results because mvexpand starts expanding into a huge number of fields and complains about memory limits. I tried to put an mvfilter before it so that it only expands those stages that I am interested in but obviously I didn't know how to use it so that ended up as a no op.
So the question is how can I make this query more efficient?
Thanks!
Sorry I don't get it. Could you expand on this a bit?
Thanks!
Well it doesn't really do what I stated in the problem.
Perhaps you could say explains this part?
> stats values(_time) as _time by stages
No _time is not unique because multiple values exist within the same event (hence mvexpand) hence the results are not correct.
I can try to give another example. Lets say I can get this table
id | mv_field |
1 | key1,100 key2,200 key3,300 |
2 | key1,100 key2,200 key3,300 |
3 | key1,100 key2,200 key3,300 |
Given this I want the result:
key | sum |
key1 | 300 |
key2 | 600 |
key3 | 900 |
The important part here is that the second column is an mv field. mvexpand breaks the memory usage there so I need some other way to accumulate the results. Maybe I will post this as a separate question cause this is perhaps simpler to explain.
Update: mvfilter didn't help with the memory. I found a solution to this that I added here:
https://community.splunk.com/t5/Splunk-Search/Accumulate-values-for-a-multi-value-field-by-key/m-p/5...