All Apps and Add-ons

For splunk add-on for linux, why do we need both ps and top?

samr
Engager

What is gained by having both ps and top collected separately by the forwarder?
Could they be merged, are people typically picking one or the other?

I'm noticing that the resulting data collected seems pretty redundant, and can consume quite a bit of index time, if you are looking at a number of hosts.

Sam.

lukejadamec
Super Champion

Your question is a great example of data visibility versus data usefulness.
You know that ps and top are different, and you know that they both consume a lot of resources, and you know they both have their uses. However, from what I've seen it is the user who prefers to use one or the other based solely on user preference unless some specific issue requires the other.

If you're using the TA for unix to feed the Unix app, then a better question might be - what do I lose from the app when I eliminate one or the other?

I think your focus should be - I can get all of this data, but what does it do for me? What are my goals for Splunk indexing and analysis? What am I looking for? What do I want to report on? What do I want to alert on?

Splunk can index and make available gobs and gobs of data, but that is not the point of implementing Splunk. The point of implementing Splunk is to make that data do something for you. What do you want to do with your data?

araitz
Splunk Employee
Splunk Employee

The output of ps and top is slightly different. ps gives you far more detail on the processes that are running on your system, including the args provided to the command, as well as the TTY they are running in. top gives you NICE and elapsed time. I do agree, though, that there is almost total overlap aside from these differences, so there might not be much harm running one or the other depending on your use case.

Combining them would be a bit harry, since we would have to merge and deduplicate the output of both commands and do so for all the platforms that the TA supports (Unix, Solaris, AIX, MacOS, etc).

0 Karma

yuanliu
Builder

Not only is there redundancy, but an identical field name could mean totally different things in the two, such is the case with pctCPU. It is really a poor choice of field name on the app's part. (As of 6.2.)

In fact, the only data unique to top as a utility are those interval data as opposed to snapshot values. (Instantaneous and cumulative values, respectively exemplified by NICE and elapsed time, are both snapshot values. The two examples are both retrievable via ps command at least in modern GNU, -o nice,etime.) %CPU in top (represented as field pctCPU in sourcetype=top) is an interval value. This is totally different from the cumulative value -o pcpu as given in ps utility. (Also represented as field pctCPU in sourcetype=ps.) The three load averages that top displays are also unavailable in ps, because though cumulative, they are aggregations (as opposed to per process). These are captured in sourcetype=vmstat via uptime utility.

In short, top is an decidedly interactive utility. GNU default displays (as used in top.ps) use human-friendly units and such, complicating field extractions, but with no purpose (because all such information is retrievable via GNU ps). The main useful field, %CPU is confused with ps' pcpu output.

For these sources to be more useful, I would (strictly GNU)

  1. Rename %CPU in top output to something else.
  2. Give ps a few more fields such as PPID, thcount and wchan.

top's diagnostic value is particularly high when run with -H. You can find which thread is eating up CPU. But that would add lots of overload to both host and indexer.

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!