I have a bunch of Universal Forwarders running on 64bit linux systems, that are forwarding the data from TA-SoS to an indexer running on Windows.
The "ps" output for my forwarders are all showing almost constant values, as well as the same CPU percentage across different machines (with varying amounts of cores).
When running
index=sos sourcetype="ps" | multikv | where COMMAND=="splunkd" | timechart range(pctCPU) by host
I get flat lines for all forwarders. My indexer is running the windows PS script, and I get measurements for that. If I go back to when the forwarder started, the CPU% shows a peak but then goes down to the constant value. Currently the forwarders claim to use constant 0.4% CPU over several days.
I also have scripts running that use top to monitor all applications on the machines, including splunkd, and they are showing variations of CPU% between 0.1-1.5 or so depending on traffic.
Why am I not getting correct CPU% measurements from TA-sos?
edit:
Read up on ps and what it does, and it seems to be a difference in how it and top works:
http://unix.stackexchange.com/questions/58539/top-and-ps-not-showing-the-same-cpu-result
Essentially, ps only measures lifetime CPU usage, while top does a sampling. Perhaps forwarders simply vary too little in CPU usage for the lifetime value to change? This makes me wonder how useful it is for detecting spikes in CPU usage on forwarders.
... View more