Solved: How to cherry pick values from different sources?

yuanliu · ‎10-20-2015

Given sourcetype=ps and sourcetype=top, in both of which pctCPU are present, how do I associate pctCPU from top only while using fields unique to ps? (Despite identical field name, values in these two sources represent very different things.)

In Splunk Add-on for Nix, for example, *ps and top both contain fields PID, COMMAND and pctCPU. (They share some other field names of interest which I will not use in this example.) As @Paolo Prigione pointed out many years ago, pctCPU in ps is not useful for monitoring. (https://answers.splunk.com/answers/27398/is-nix-sourcetype-ps-pctcpu-really-suitable-for-charting-oo...) In the simplest use case, pctCPU in top would give the instantaneous CPU usage of each process. However, COMMAND in top only gives a simple program name, which is insufficient for my purposes. (In the old nix for Splunk, *ps' COMMAND includes full arguments; in Splunk Add-on for Nix, *ps has a separate ARGS field.)

Conceivably I can associate top's pctCPU values with ps' app (combination of COMMAND and ARGS in the new Splunk Add-on for Nix) by joining a *top search with a ps search. This looks very wasteful, however. So I thought I would tackle it by a simple search, then eliminate values from ps.

index=os (sourcetype=ps OR sourcetype=top)
|  bucket _time span=1m
| stats values(if(sourcetype="ps",app,COMMAND)) as app values(eval(if(sourcetype="top",pctCPU,null()))) as pctCPU by _time PID

(bucket _time is necessary because, though launched with the same frequency, the two sources often have sub-minute stagger.) This works for all processes output from ps. However, as ps and top do not always survey the same processes even when they are launched within a subsecond, some processes captured by ps will not show in top of the same time interval, and vice versa. As a result, the above strategy gives null values when the process is in ps only. I want to fill these gaps with values from ps, because for these extremely momentary processes, pctCPU from ps has the same significance as that from top.

In other words, I want eliminate value of pctCPU from ps when top is available, but use value from ps when not. (The first term in the example, values(if(sourcetype="ps",app,COMMAND)) as app, is a much more sophisticated macro output in reality. That output can cause gaps when a process is only in ps but missing from top.)

yuanliu · ‎10-21-2015

@woodcock's introduction of coalesce makes me search for alternative statement of the problem. Here is one clunky solution:

index=os (sourcetype="top" OR sourcetype=ps)
| eval pctCPU=sourcetype.pctCPU
| bucket _time span=1m
| stats values(pctCPU) as pctCPU latest(eval(if(sourcetype="ps",app,COMMAND) as app
 by _time PID host
| eval pctCPU=replace(if(match(pctCPU,"top"),mvfilter(match(pctCPU,"top")),pctCPU),"[stop]+","")

Effectively, label pctCPU from different sources, then filter desired values by label based on the pseudo code; get rid of the label lastly. ( (ps|top) would be more efficient, but [stop]+ or [tops]+ has the sound byte.)

It is noisy in terms of code efficiency, and that span=1m is a very bad approximation. (There should be better methods to tidy up small stagger.) I hope for better, but I'll take this for the time being.

View solution in original post

stmyers7941 · ‎10-21-2015

Have you considered the Nmon app? You may be able to accomplish what you're looking for and more vs the nix app.

yuanliu · ‎10-21-2015

Thanks for the suggestion, @stmyers7941. Though keenly aware of the pains induced by *nix app, the option is not mine to pick . This said, the general method could have other use cases when field name overload happens.

yuanliu · ‎10-21-2015

@woodcock's introduction of coalesce makes me search for alternative statement of the problem. Here is one clunky solution:

index=os (sourcetype="top" OR sourcetype=ps)
| eval pctCPU=sourcetype.pctCPU
| bucket _time span=1m
| stats values(pctCPU) as pctCPU latest(eval(if(sourcetype="ps",app,COMMAND) as app
 by _time PID host
| eval pctCPU=replace(if(match(pctCPU,"top"),mvfilter(match(pctCPU,"top")),pctCPU),"[stop]+","")

Effectively, label pctCPU from different sources, then filter desired values by label based on the pseudo code; get rid of the label lastly. ( (ps|top) would be more efficient, but [stop]+ or [tops]+ has the sound byte.)

It is noisy in terms of code efficiency, and that span=1m is a very bad approximation. (There should be better methods to tidy up small stagger.) I hope for better, but I'll take this for the time being.

yuanliu · ‎10-21-2015

The above works well as a solution to the stated generalised question. But there's a big caveat as to suitability for fixing the nix app. In GNU *top, the default (which is how top.ps calls it) is to use Irix mode, in which percentage is calculated against a single core. For this data to be useful, therefore, one must divide the number by number of cores. But then, I haven't determined how GNU ps handles pcpu. Is it calibrated against a single core or is it against all cores? I'll post outcome in the other thread. In all cases, I really like to see *nix app fixed from the source as I suggested in https://answers.splunk.com/answers/117872/for-splunk-add-on-for-linux-why-do-we-need-both-ps-and-top....

woodcock · ‎10-22-2015

If you are going with this answer (note that I modified my solution yet again), then you should click "Accept".

yuanliu · ‎10-26-2015

@woodcock I'm going with this. After some investigation, I realise that field name overload is a cardinal sin that we shouldn't commit in the first place. So I'm really trying to solve an artificial problem. Still, your methods really expanded my Splunk vocabulary. (xyseries is something I have wanted for some other problems.)

yuanliu · ‎10-21-2015

An alternative statement of the problem could be: How to ask Splunk to perform the following pseudo code:

discard pctCPU from sourcetype=ps IF output from sourcetype=top exists for that PID in that sample period (every 5 minute but wavers from period to period and from sourcetype to sourcetype)
discard COMMAND from sourcetype=top IF output from sourcetype=ps exists for that PID in that sample period

woodcock · ‎10-21-2015

Like this:

index=os (sourcetype=ps OR sourcetype=top)
| bucket _time span=1m
| chart over _time latests(pctCPU) by sourcetype
| eval pctCPU=coalesce(top, ps)

At this point, each value for _time (each minute) has a value for pctCPU that uses sourcetype top in preference to sourcetype ps. Tack on the rest of what you need after that.

yuanliu · ‎10-21-2015

@woodcock Thanks for the reply. I need the result by PID so I can show consumption of each process over time.

woodcock · ‎10-21-2015

OK, then do this:

index=os (sourcetype=ps OR sourcetype=top)
| bucket _time span=1m
| chart over _time latests(pctCPU) by sourcetype PID
| eval pctCPU=coalesce(top, ps)

yuanliu · ‎10-21-2015

I mean, Splunk won't allow two groupings in chart when over is used. I have already permuted through these.

woodcock · ‎10-21-2015

OK, then try this:

index=os (sourcetype=ps OR sourcetype=top)
| bucket _time span=1m
| stats latest(pctCPU) AS pctCPU by sourcetype PID _time
| eval combo=sourcetype . ":" . PID
| xyseries _time combo pctCPU
| foreach top* [ eval pctCPU<<MATCHSTR>>=coalesce(top<<FIELD>>, ps<<FIELD>>)

How to cherry pick values from different sources?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

How to cherry pick values from different sources?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!