Monitoring Splunk

How can I find what source running in splunk is causing the linux server to spike in CPU utilizations?

Log_wrangler
Builder

I received a warning

Search peer ip-1-1-1-1.ec2.internal has the following message: skipped indexing of internal audit event will keep dropping events until indexer congestion is remedied. Check disk space and other issues that may cause indexer to block

Uptime shows very high CPU utilization on the server.

Is there some query in _internal I can use to see if an app or alert or source or schedule task is causing this?

Thank you

Tags (3)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Obvious side-note: Do upgrade your Splunk. There have been tons of improvements made since 5.0, and 5.0 is out of support - end of life was reached a year ago.
The introspection endpoints used by the monitoring console - and the search that doesn't work for you - are just a tiny sliver of that pie.

sudosplunk
Motivator

If you've setup Monitoring console, then you can make use of Resource Usage tab which provides information about the resource usage in your deployment. Definitely a good starting point.

Here is the search which splunk uses to calculate Resource Usage: Deployment. See if this works for you.

| rest splunk_server_group=* splunk_server_group="*" /services/server/status/resource-usage/hostwide
| join type=outer splunk_server [
  | rest splunk_server_group=* splunk_server_group="*" /services/server/status/resource-usage/iostats
  | eval iops = round(reads_ps + writes_ps)
  | eval iops_mountpoint = iops." (".mount_point.")"
  | eval cpupct_mountpoint = cpu_pct."% (".mount_point.")"
  | stats values(iops_mountpoint) as iops_mountpoint, values(cpupct_mountpoint) as cpupct_mountpoint by splunk_server]
| eventstats min(eval(if(isnull(normalized_load_avg_1min), "0", "1"))) as _load_avg_full_availability
| eval normalized_load_avg_1min = if(isnull(normalized_load_avg_1min), "N/A", normalized_load_avg_1min)
| eval core_info = if(isnull(cpu_count), "N/A", cpu_count)." / ".if(isnull(virtual_cpu_count), "N/A", virtual_cpu_count)
| eval cpu_usage = cpu_system_pct + cpu_user_pct
| eval mem_used_pct = round(mem_used / mem * 100 , 2)
| eval mem_used = round(mem_used, 0)
| eval mem = round(mem, 0)
| fields splunk_server, normalized_load_avg_1min, core_info, cpu_usage, mem, mem_used, mem_used_pct, iops_mountpoint, cpupct_mountpoint
| sort - cpu_usage, -mem_used
| rename splunk_server AS Instance, normalized_load_avg_1min AS "Load Average", core_info AS "CPU Cores (Physical / Virtual)", cpu_usage AS "CPU Usage (%)", mem AS "Physical Memory Capacity (MB)", mem_used AS "Physical Memory Usage (MB)", mem_used_pct AS "Physical Memory Usage (%)", iops_mountpoint as "I/O Operations per second (Mount Point)", cpupct_mountpoint as "I/O Bandwidth Utilization (Mount Point)"
0 Karma

Log_wrangler
Builder

thank you for the reply, unfortunately the index is on v5.x , and there is no Monitor console on that version.

I am afraid to run your query as it might overload the indexer.

But I will look at it... and give it a shot.

0 Karma

Log_wrangler
Builder

the query did not work

0 Karma

sudosplunk
Motivator

I will convert this to comment so that other can help you meanwhile.

0 Karma

Log_wrangler
Builder

Your answer is correct... however, I just had an old version.

Please convert to answer and I will accept.

0 Karma