Monitoring Splunk

data.elapsed duration for scheduled searches + multiple instances of search- How do i read the results?

oliverja
Path Finder

I am investigating higher CPU usage on my indexers, and am finding that this is a hard topic to really pinpoint.

I run this search on my search head to identify different searches and the resource consumption, but the results are confusing me.

 

 

 

 

index=_introspection host=* source=*/resource_usage.log* component=PerProcess data.process_type="search" 
 | stats latest(data.pct_cpu) AS resource_usage_cpu latest(data.mem_used) AS resource_usage_mem by  _time, data.search_props.type,data.search_props.mode,data.search_props.user, data.search_props.app, host data.search_props.label data.elapsed data.search_props.search_head
 | sort - resource_usage_cpu

 

 

 

 

 

_time 

data.search_props.type 

data.search_props.mode 

host 

data.search_props.label

data.elapsed

data.search_props.search_head

resource_usage_cpu

2022-11-01 10:23:54.338

scheduled

historical batch

idx04-k

Process-Creation-Events-DomainController

1431.6000

sh02-g

95.40

2022-11-01 10:23:52.815

scheduled

historical batch

idx03-k

Process-Creation-Events-DomainController

1430.0200

sh02-g

115.50

2022-11-01 10:23:50.738

scheduled

historical batch

idx05-k

Process-Creation-Events-DomainController

1427.9800

sh02-g

105.70

2022-11-01 10:23:46.748

scheduled

historical batch

idx03-g

Process-Creation-Events-DomainController

1424.0400

sh02-g

101.90

2022-11-01 10:23:45.081

scheduled

historical batch

idx02-k

Process-Creation-Events-DomainController

1422.3200

sh02-g

97.90

From this, I can see that the search:

1) Was triggered from sh02

2) Was executed across several my indexers

3) Took ~1500 seconds to run

4) Consumed ~1 core on each instance

BUT:

The search is scheduled for once a day, and that time is not 10:23. It is scheduled for 11. (No window)

There  are dozens on "instances" of this search being executed on all 10 of my indexers, triggered by sh02, in the ~10:22 timeframe. Maybe one row in the table above per indexer might make sense, but this is so many.

What is happening here? How do I read these results to make a sane performance judgement about this situation?

Labels (1)
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...