In addition to the problems @PickleRick points out, the SPL ignored a fundamental design in the dataset. The use of stats without group by mr_batchId begs the question: What is the logic to say whic...
See more...
In addition to the problems @PickleRick points out, the SPL ignored a fundamental design in the dataset. The use of stats without group by mr_batchId begs the question: What is the logic to say which values should form ONE line as opposed to another? In fact, whenever you find yourself "needing" to use stats by _time in Splunk, you ought to tell yourself that some logic is probably wrong. The first troublesome command is | spath resourceSpans{}.scopeSpans{}.spans{}.attributes{} output=attributes . Here, you bypass several arrays of arrays to only focus on resourceSpans{}.scopeSpans{}.spans{}.attributes{}. Unless you are absolutely certain about the uniqueness of this path, the prudent strategy is to fully process handle each array. In your case, the next command, | dedup attributes, indicates that there is no such certainty. But uniqueness is not the big problem here. The real problem is: the path resourceSpans{}.scopeSpans{}.spans{} is the key to your developer/vendor's data design. Each value of resourceSpans{}.scopeSpans{}.spans{} contains a unique mr_batchId that is key to distinguish dataset. If you want to perform stats, perform stats against resourceSpans{}.scopeSpans{}.spans{}. So, step one is to fully mvexpand into this path: host="MARKET_RISK_PDT_V2" index="murex_logs" sourcetype="Market_Risk_DT" "**mr_strategy**" "typo_Collar"
"resourceSpans{}.resource.attributes{}.value.stringValue"="*"
| fields - resourceSpans{}.*
| spath path=resourceSpans{}
| mvexpand resourceSpans{}
| spath input=resourceSpans{} path=scopeSpans{}
| fields - resourceSpans{}
| mvexpand scopeSpans{}
| spath input=scopeSpans{} path=spans{}
| fields - scopeSpans{}
| mvexpand spans{} The above does not address the efficiency problem with **mr_strategy**, but collapses search of "resourceSpans{}.resource.attributes{}.value.stringValue"="*" into index search, which also improves efficiency. Using your sample data, the above will give 96 spans{} values for a single event. Among the 96, only two are relevant to your final results. So, I would recommend adding | where match('spans{}', "mr_batchId") This would give two rows like spans{} {"traceId":"e0d25217dd28e57d2db07e06d690428f","spanId":"d6c133764c7891c3","parentSpanId":"dbd5a3ed4854e73f","name":"fullreval_task","kind":1,"startTimeUnixNano":"1744296121513194653","endTimeUnixNano":"1744296126583212823","attributes":[{"key":"market_risk_span","value":{"stringValue":"true"}},{"key":"mr_batchId","value":{"stringValue":"37"}},{"key":"mr_batchType","value":{"stringValue":"Full Revaluation"}},{"key":"mr_bucketName","value":{"stringValue":""}},{"key":"mr_jobDomain","value":{"stringValue":"Market Risk"}},{"key":"mr_jobId","value":{"stringValue":"CONSO_ABAQ | 31/03/2016 | 12"}},{"key":"mr_strategy","value":{"stringValue":"typo_Collar Cap"}},{"key":"mr_uuid","value":{"stringValue":"4405ed87-fbc0-4751-b5b2-41836f1181cc"}},{"key":"mrb_batch_affinity","value":{"stringValue":"CONSO_ABAQ_run_Batch|CONSO_ABAQ|2016/03/31|12_FullReval0_00037"}},{"key":"mr_batch_compute_cpu_time","value":{"doubleValue":2.042433}},{"key":"mr_batch_compute_time","value":{"doubleValue":2.138}},{"key":"mr_batch_load_cpu_time","value":{"doubleValue":2.154398}},{"key":"mr_batch_load_time","value":{"doubleValue":2.852}},{"key":"mr_batch_status","value":{"stringValue":"WARNING"}},{"key":"mr_batch_total_cpu_time","value":{"doubleValue":4.265003}},{"key":"mr_batch_total_time","value":{"doubleValue":5.069}}],"status":{}} {"traceId":"e0d25217dd28e57d2db07e06d690428f","spanId":"4c8da45757b1ea2a","parentSpanId":"dbd5a3ed4854e73f","name":"fullreval_task","kind":1,"startTimeUnixNano":"1744296126596384480","endTimeUnixNano":"1744296130515095708","attributes":[{"key":"market_risk_span","value":{"stringValue":"true"}},{"key":"mr_batchId","value":{"stringValue":"58"}},{"key":"mr_batchType","value":{"stringValue":"Full Revaluation"}},{"key":"mr_bucketName","value":{"stringValue":""}},{"key":"mr_jobDomain","value":{"stringValue":"Market Risk"}},{"key":"mr_jobId","value":{"stringValue":"CONSO_ABAQ | 31/03/2016 | 12"}},{"key":"mr_strategy","value":{"stringValue":"typo_Non Deliv. Xccy Swap"}},{"key":"mr_uuid","value":{"stringValue":"f6035cef-e661-49bd-8b4c-d8d09da06822"}},{"key":"mrb_batch_affinity","value":{"stringValue":"CONSO_ABAQ_run_Batch|CONSO_ABAQ|2016/03/31|12_FullReval0_00058"}},{"key":"mr_batch_compute_cpu_time","value":{"doubleValue":0.8687239999999999}},{"key":"mr_batch_compute_time","value":{"doubleValue":0.907}},{"key":"mr_batch_load_cpu_time","value":{"doubleValue":2.257638}},{"key":"mr_batch_load_time","value":{"doubleValue":2.955}},{"key":"mr_batch_status","value":{"stringValue":"OK"}},{"key":"mr_batch_total_cpu_time","value":{"doubleValue":3.1801}},{"key":"mr_batch_total_time","value":{"doubleValue":3.917}}],"status":{}} But for flexibility, I consider this optional. From here, there are many ways to get to your desired output. Given that you only need mr_batch_compute_cpu_time, mr_batch_compute_time, mr_batch_load_cpu_time, mr_batch_load_time, and mr_strategy, I recommend to directly extract them; however, I strongly recommend adding mr_batchId to the list because that's a critical piece of information for you to corroborate data and validate your calculations. ``` the following line is optional - improves efficiency if these are the only attributes of interest
| where match('spans{}', "mr_batchId")
```
| spath input=spans{} path=attributes{} output=attributes
| foreach mr_batchId mr_batch_compute_cpu_time mr_batch_compute_time mr_batch_load_cpu_time mr_batch_load_time mr_strategy
[eval <<FIELD>> = mvappend(<<FIELD>>, mvmap(attributes, if(spath(attributes, "key") != "<<FIELD>>", null(), spath(attributes, "value")))),
<<FIELD>> = coalesce(spath(<<FIELD>>, "doubleValue"), spath(<<FIELD>>, "stringValue"))]
| dedup _time mr_batchId
``` the above is key logic. If there is any doubt, you can also use
| dedup _time mr_batchId mr_batch_compute_cpu_time mr_batch_compute_time
```
| table _time mr_batchId mr_batch_compute_cpu_time mr_batch_compute_time mr_batch_load_cpu_time mr_batch_load_time mr_strategy With this, the output will be _time mr_batchId mr_batch_compute_cpu_time mr_batch_compute_time mr_batch_load_cpu_time mr_batch_load_time mr_strategy 2025-04-12 23:55:21 37 2.042433 2.138 2.154398 2.852 typo_Collar Cap 2025-04-12 23:55:21 58 0.8687239999999999 0.907 2.257638 2.955 typo_Non Deliv. Xccy Swap There is no need to perform stats against _time.