Getting Data In

Doing stats on multivalued JSON fields (mxexpand is too slow)

Communicator

Hi Ninjas

I'm dealing with some deeply nested JSON events like:

"sendTime":"2017-09-21T17:02:06.583+02:00","runningProcess":[{"Name":"_Total","PercentProcessorTime":"100","WorkingSetPrivate":"1557368"},{"Name":"Bananaservice","PercentProcessorTime":"0","WorkingSetPrivate":"593"},{"Name":"Cherryservice","PercentProcessorTime":"0","WorkingSetPrivate":"7671"},{"Name":"Pineappleservice","PercentProcessorTime":"0","WorkingSetPrivate":"466"},{"Name":"Kiwiservice","PercentProcessorTime":"0","WorkingSetPrivate":"442"},{"Name":"Appleservice","PercentProcessorTime":"0","WorkingSetPrivate":"630"},{"Name":"Peachservice","PercentProcessorTime":"0","WorkingSetPrivate":"1470"}

So all i want to do is getting out the avg values over time by each process, something like

| stats avg(runningProcess{}.PercentProcessorTime) as CPU by runningProcess{}.Name, _time
| stats list(*) as * by _time

But without mvexpand and so on, I'm not getting the right data as just takes the value of the first entry of the mv field by each event.
As said, I'm aware of doing it with mvexpand etc. but it slows down the search dramatically and i was wondering whether there is a more elegant solution to get the right data here.

Thanks

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi claudio.manig,

I have tried to remove mvexpand from your search. Can you please try below search??

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB) 
| stats count by _time,messageId,tempField 
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2) 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

I'm not sure about "runningProcess{}.Name" field name I'm assuming from your search. So you can replace it with the original field name.

I hope It will help you.

Thanks

View solution in original post

SplunkTrust
SplunkTrust

Hi claudio.manig,

I have tried to remove mvexpand from your search. Can you please try below search??

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB) 
| stats count by _time,messageId,tempField 
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2) 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

I'm not sure about "runningProcess{}.Name" field name I'm assuming from your search. So you can replace it with the original field name.

I hope It will help you.

Thanks

View solution in original post

Communicator

Hey kamlesh - looking really good so far! But i was not sure why you used the "messageId" field to group your first stats as this field does not exist.

0 Karma

SplunkTrust
SplunkTrust

ooh.. that's just my temporary variable... I forgot to remove it.

Can you please remove it and execute search?

| index=skynet sourcetype=t800
| rename runningProcess{}.Name as ProcessName, runningProcess{}.PercentProcessorTime as ProcessCPU, runningProcess{}.WorkingSetPrivate as ProcessMemoryKB
| eval tempField=mvzip(mvzip(ProcessName,ProcessCPU),ProcessMemoryKB)
| stats count by time,tempField
| eval ProcessName=mvindex(split(tempField,","),0), ProcessCPU=mvindex(split(tempField,","),1), ProcessMemoryKB=mvindex(split(tempField,","),2)
| where ProcessCPU > 20 and ProcessName!="
Total" and ProcessName!="Idle"
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName
| sort -AvgProcessCPU -count

0 Karma

Communicator

No worries I already thought that and tired it and it works! I have to test it to verify the effective performance impact. Thanks a lot!

0 Karma

SplunkTrust
SplunkTrust

great.. If it is working then be sure to click Accept to close this question.

0 Karma

SplunkTrust
SplunkTrust

@claudio.manig - Please post the rest of the SPL for the searches that you've tried. It's possible that we can optimize the extraction itself, or reduce the impact of the mvexpand with some subtle magic.

0 Karma

Communicator

Well mvexpand itself already slows it down by its logic - creating an event out of each field value. So in reality i have json events with around 200+ processes running, resulting in tons of events out of one.

But sure, have a go, the query looks something like that:

| index=skynet sourcetype=t800
| spath output=prcNames path=runningProcess{}
| mvexpand prcNames
| rex field=prcNames "\{\"Name\":\"(?<ProcessName>.*)\",\"PercentProcessorTime\":\"(?<ProcessCPU>.*)\",\"WorkingSetPrivate\":\"(?<ProcessMemoryKB>.*)\"\}" 
| where ProcessCPU > 20 and ProcessName!="_Total" and ProcessName!="Idle" 
| stats count avg(ProcessCPU) AS AvgProcessCPU avg(ProcessMemoryKB) AS AvgMem by ProcessName 
| sort -AvgProcessCPU -count

The spath is needed so we can group the subcategories of each processname like cpu and mem together.

0 Karma