Getting Data In

How to add OPTIONS to the Splunk_TA_nix scripts?

thebeno
Explorer
I want to focus your attention on the method of collecting CPU utilization data in Splunk_TA_nix (cpu_metric.sh).

I have been dealing with many false positive alerts regarding CPU usage in our organization.
We have ITSI implemented and use Splunk_TA_nix to collect data.

An alert is generated when 2 values ​​of CPU usage > 90%.
We collect values ​​every 5 minutes.
Script for collecting this data (Splunk_TA_nix/bin/cpu_metric.sh) use the command sar -P ALL 1 1.
This command will display the CPU load within 1 second.
If used for CPU monitoring in our setup (every 5 min)
we only have information about 1 second out of five minutes.
Based on this data we evaluate CPU usage.

Normally the CPU usage fluctuates depending on how the commands are started, how long they run, and how difficult they are.
With this method of measurement, it happens quite often that 2 values ​​cross the threshold in a row. Based on this, an alert is subsequently generated.
For monitoring, however, it is important to know the average CPU utilization and not random peaks.
When collecting average values, such false positive alerts would not occur (if the CPU is not overloaded).

The standard way good administrators test CPU usage is, for example: sar 120 1 when they get an average CPU usage in 2 minutes. Data collection in sar via cron was once recommended to be set up like this:

*/10 * * * * root /usr/lib64/sa/sa1 -S XALL 600 1.

This setup collected the average CPU usage over a 10-minute period, wrote this value to a sar file, and repeated this every 10 minutes.
Such a setting gives a real overview of how the CPU is pulled out.

Splunk does not provide a reasonable way to set these values ​​in the cpu_metric.sh script.
The only way to solve it is to copy this script and modify it according to yourself.
However, the connection to Splunk_TA_nix will be lost. What happens when Splunk_TA_nix is ​​upgraded?

My preference is to enable CPU data collection by introducing the following stanza in our application (deployed via the deployment server) which is linked to Splunk_TA_nix.

[script://$SPLUNK_HOME/etc/apps/Splunk_TA_nix/bin/cpu_metric.sh]
disabled = false
index = unix_perfmon_metrics

But this method does not give us the possibility to set OPTIONS for sar.
It would be ideal if something like this could be done:

[script://./bin/my_cpu_metric.sh]
disabled = false
index = unix_perfmon_metrics

./bin/my_cpu_metric.sh
exec $SPLUNK_HOME/etc/apps/Splunk_TA_nix/bin/cpu_metric.sh 120 1

But this doesn't work.
It would not be necessary for cpu_metric.sh to be able to process some input settings and modify the use of the sar command.
The same can also be applied to other scripts in this TA.

If you have similar experiences, feel free to share them. If my concerns are justified, it would be right if this TA would be updated and give administrators the opportunity to set better metrics collection parameters.
What do you think?

PickleRick
SplunkTrust
SplunkTrust

This is a scripted input so it doesn't have all the mechanics associated with modular inputs - you cannot pass parameters to it by setting config items in input config stanza. But it works on UF whereas modular inputs don't.

Anyway, the scripts for ta_nix are more like examples to tune and adjust to your needs than ready-for-production.

thebeno
Explorer

Hi Rick,

thanks for reply.
Many customers are using this app as final product from Splunk. 
We would like to enable injections as easy as possible and not break connections between Splunk_TA_nix and our custom app. You can see my example in article. 

Only think what is needed for this  to work is small change in scripts.
Here is very easy and dirty example how the script could be improved:

diff cpu_metric.sh cpu_metric_new.sh
4a5,11
> # OPTIONS
> if [ "$#" -eq 1 ]; then
> OPTIONS="$1"
> else
> OPTIONS="-P ALL 1 1"
> fi
>
24c31,32
< CMD='sar -P ALL 1 1'
---
> #CMD='sar -P ALL 1 1'
> CMD="sar $OPTIONS"

I hope I am not the only one who will appreciate this.

0 Karma
Get Updates on the Splunk Community!

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...

Explore the Latest Educational Offerings from Splunk [January 2025 Updates]

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...