All Apps and Add-ons

Metricator TOP panel - different SPL leads to major differences in performance

patouellet
Path Finder

Hi!

I’m noticing very different SPL and thus different performance between the NMON Summary Light Analysis dashboard, the Top 20 processes CPU Statistics panel specifically, vs the NMON Analyser LINUX dashboard, Process, Kernel, I/O Statistics, Top, CPU Usage per logical core.

When I compare the 2 dashboards, in terms of results, they look identical to me. But the performance isn’t the same – the one from the Light Analysis is about 4 times slower. I am wondering why and if it’s normal. Check it out:

From NMON Summary Light Analysis:

patouellet_0-1605716386627.png

SPL:

| mstats max(_value) as value where `nmon_metrics_index` metric_name="os.unix.nmon.processes.top.pct_CPU" host=myhostby host, metric_name, dimension_Command, dimension_PID span=1m

| stats sum(value) as pct_CPU by _time, host, metric_name, dimension_Command

| appendcols [ | mstats latest(_value) as logical_cpus where `nmon_metrics_index` metric_name="os.unix.nmon.cpu.cpu_all.logical_cpus" host=myhost by host ]

| appendcols [ | mstats latest(_value) as virtual_cpus where `nmon_metrics_index` metric_name="os.unix.nmon.cpu.cpu_all.virtual_cpus" host=myhost by host ]

| filldown logical_cpus, virtual_cpus

| stats values(pct_CPU) as pct_CPU, values(logical_cpus) as logical_cpus, values(virtual_cpus) as virtual_cpus by _time, host, dimension_Command

| eval usage_per_core=(pct_CPU/100), smt_threads=(logical_cpus/virtual_cpus)

| eval usage_per_core=case(isnum(smt_threads) AND smt_threads>="2", usage_per_core*1.2, isnum(smt_threads) AND smt_threads>="4", usage_per_core*1.4, isnum(usage_per_core), usage_per_core)

| timechart `nmon_span` useother=f limit="20" max(usage_per_core) as "CPU Usage per core" by dimension_Command

Runtime:

This search has completed and has returned 364 results by scanning 533,512 events in 3.937 seconds

And from NMON Analyser Linux Dashboard

patouellet_1-1605716386637.png

 

SPL

| mstats max(_value) as value where `nmon_metrics_index` metric_name="os.unix.nmon.processes.top.pct_CPU" host="myhost" by dimension_Command dimension_PID span=1m

| stats sum(value) as pct_CPU by _time, dimension_Command

| eval usage_per_core=(pct_CPU/100)

| timechart `nmon_span` useother=f limit="50" max(usage_per_core) as "CPU Usage per core" by dimension_Command

Runtime:

This search has completed and has returned 364 results by scanning 533,512 events in 1.36 seconds

Again, identical results, but very different performance and different SPL – which is most likely the cause of the different performance.

Thoughts?

Thanks!

@guilmxm

Labels (1)
0 Karma
1 Solution

guilmxm
SplunkTrust
SplunkTrust

Hola @patouellet !

My apologise for the late reply (I was very busy in TrackMe!), and thanks @dsou for your great help 😉

So, to answer the question, this is mostly an historical reason, in the NMON Summary Light dashboard the search is designed to be OS type agnostic, in short the smt threads is an AIX only concept which requires some obscur additional calculation to get closer to an accurate measurement of the processes (PIDs) costs.

Technically for Linux, that isn't required, the Nmon Analyser view for Linux does not have to deal with this and thus the query is simplier and more effiscient.
There's an additional field in the mstats level by statement too which can matter at some points as it creates more in memory records and I assume certainly has some compute costs too. (this one could actually be removed)

Last but not least, the SPL queries were designed for the initial implementation of mstats, there has been updates since and these queries could potentially be even faster, for instance the search could be:

| mstats sum(os.unix.nmon.processes.top.pct_CPU) as pct_CPU where `nmon_metrics_index` host=myhost by dimension_Command span=1m
| eval usage_per_core=(pct_CPU/100)
| timechart `nmon_span` useother=f limit="50" max(usage_per_core) as "CPU Usage per core" by dimension_Command


For compability purposes with old versions of Splunk, the app was not updated, however with the deprecation of 7.0, it certainly would be interesting to review and update these queries.

Building this app and the Nmon legacy has a been a work of many years with complex dashboards and many UIs, so this wouldn't happen in a day but defitively in my todo list.

Let me know if you have any further questions.

Guilhem







View solution in original post

guilmxm
SplunkTrust
SplunkTrust

Hola @patouellet !

My apologise for the late reply (I was very busy in TrackMe!), and thanks @dsou for your great help 😉

So, to answer the question, this is mostly an historical reason, in the NMON Summary Light dashboard the search is designed to be OS type agnostic, in short the smt threads is an AIX only concept which requires some obscur additional calculation to get closer to an accurate measurement of the processes (PIDs) costs.

Technically for Linux, that isn't required, the Nmon Analyser view for Linux does not have to deal with this and thus the query is simplier and more effiscient.
There's an additional field in the mstats level by statement too which can matter at some points as it creates more in memory records and I assume certainly has some compute costs too. (this one could actually be removed)

Last but not least, the SPL queries were designed for the initial implementation of mstats, there has been updates since and these queries could potentially be even faster, for instance the search could be:

| mstats sum(os.unix.nmon.processes.top.pct_CPU) as pct_CPU where `nmon_metrics_index` host=myhost by dimension_Command span=1m
| eval usage_per_core=(pct_CPU/100)
| timechart `nmon_span` useother=f limit="50" max(usage_per_core) as "CPU Usage per core" by dimension_Command


For compability purposes with old versions of Splunk, the app was not updated, however with the deprecation of 7.0, it certainly would be interesting to review and update these queries.

Building this app and the Nmon legacy has a been a work of many years with complex dashboards and many UIs, so this wouldn't happen in a day but defitively in my todo list.

Let me know if you have any further questions.

Guilhem







patouellet
Path Finder

Thank you - yes I understand that. I'm using the Metricator App for NMON - those are the SPL automatically generated by the built-in dashboards. My question is around the app and why those panels, which aims to show the same results, have different SPL leading to different performance.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
We must wait that the creator of this app will answer why he/she is using different SPL to get same answers. Usually there is many ways to get same result and it's your job to try to find the most efficient way to achieve wanted result.
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

In first SPL you are using appendcols which kills your performance. Stats is almost always better to do joins than join, append, appendcols etc. Here is one presentation which tell this quite well.

https://conf.splunk.com/files/2020/slides/TRU1761C.pdf

r. Ismo

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...