Solved: Re: Doing stats on multivalued JSON fields (mxexpa...

claudio_manig · ‎10-03-2017

Hi Ninjas

I'm dealing with some deeply nested JSON events like:

"sendTime":"2017-09-21T17:02:06.583+02:00","runningProcess":[{"Name":"_Total","PercentProcessorTime":"100","WorkingSetPrivate":"1557368"},{"Name":"Bananaservice","PercentProcessorTime":"0","WorkingSetPrivate":"593"},{"Name":"Cherryservice","PercentProcessorTime":"0","WorkingSetPrivate":"7671"},{"Name":"Pineappleservice","PercentProcessorTime":"0","WorkingSetPrivate":"466"},{"Name":"Kiwiservice","PercentProcessorTime":"0","WorkingSetPrivate":"442"},{"Name":"Appleservice","PercentProcessorTime":"0","WorkingSetPrivate":"630"},{"Name":"Peachservice","PercentProcessorTime":"0","WorkingSetPrivate":"1470"}

So all i want to do is getting out the avg values over time by each process, something like

| stats avg(runningProcess{}.PercentProcessorTime) as CPU by runningProcess{}.Name, _time
| stats list(*) as * by _time

But without mvexpand and so on, I'm not getting the right data as just takes the value of the first entry of the mv field by each event.
As said, I'm aware of doing it with mvexpand etc. but it slows down the search dramatically and i was wondering whether there is a more elegant solution to get the right data here.

Thanks

kamlesh_vaghela · ‎10-04-2017

Hi claudio.manig,

I have tried to remove mvexpand from your search. Can you please try below search??

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB) 
| stats count by _time,messageId,tempField 
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2) 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

I'm not sure about "runningProcess{}.Name" field name I'm assuming from your search. So you can replace it with the original field name.

I hope It will help you.

Thanks

View solution in original post

kamlesh_vaghela · ‎10-04-2017

Hi claudio.manig,

I have tried to remove mvexpand from your search. Can you please try below search??

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB) 
| stats count by _time,messageId,tempField 
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2) 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

I'm not sure about "runningProcess{}.Name" field name I'm assuming from your search. So you can replace it with the original field name.

I hope It will help you.

Thanks

claudio_manig · ‎10-04-2017

Hey kamlesh - looking really good so far! But i was not sure why you used the "messageId" field to group your first stats as this field does not exist.

kamlesh_vaghela · ‎10-04-2017

ooh.. that's just my temporary variable... I forgot to remove it.

Can you please remove it and execute search?

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB)
| stats count by _time,tempField
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2)
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle"
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName
| sort -AvgProcessCPU -count

claudio_manig · ‎10-04-2017

No worries I already thought that and tired it and it works! I have to test it to verify the effective performance impact. Thanks a lot!

kamlesh_vaghela · ‎10-04-2017

great.. If it is working then be sure to click Accept to close this question.

DalJeanis · ‎10-03-2017

@claudio.manig - Please post the rest of the SPL for the searches that you've tried. It's possible that we can optimize the extraction itself, or reduce the impact of the mvexpand with some subtle magic.

claudio_manig · ‎10-03-2017

Well mvexpand itself already slows it down by its logic - creating an event out of each field value. So in reality i have json events with around 200+ processes running, resulting in tons of events out of one.

But sure, have a go, the query looks something like that:

| index=skynet sourcetype=t800
| spath output=prcNames path=runningProcess{}
| mvexpand prcNames
| rex field=prcNames "\{\"Name\":\"(?<ProcessName>.*)\",\"PercentProcessorTime\":\"(?<ProcessCPU>.*)\",\"WorkingSetPrivate\":\"(?<ProcessMemoryKB>.*)\"\}" 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

The spath is needed so we can group the subcategories of each processname like cpu and mem together.

Doing stats on multivalued JSON fields (mxexpand is too slow)

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

(re)Introducing the Splunk Community Champions + 2026 – 2027 Splunk MVPs ...

Introducing the 2026 - 2027 SplunkTrust cohort!

Pro Tips for .conf26: How to Prep Like a Splunk Veteran

Join the Conversation

Doing stats on multivalued JSON fields (mxexpand is too slow)

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

(re)Introducing the Splunk Community Champions + 2026 – 2027 Splunk MVPs ...

Introducing the 2026 - 2027 SplunkTrust cohort!

Pro Tips for .conf26: How to Prep Like a Splunk Veteran