Splunk Search

How to improve my search to identify queries which consume a large amount of memory?

ddrillic
Ultra Champion

We had recently Search Heads crashing and it seems that queries which consume 11-12 GBs of memory cause the crashes.

We are trying the following search but it returns 0 results for the past week -

index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
| stats max(data.mem_used) AS peak_mem_usage latest(data.search_props.*) AS * min(_time) AS min_time max(_time) AS max_time by data.search_props.sid
| sort 20 - peak_mem_usage
| fields data.search_props.sid peak_mem_usage mode type role app user min_time max_time
| convert ctime(min_time)
| convert ctime(max_time)
| rename data.search_props.sid AS SID peak_mem_usage AS "Peak Physical Memory Usage (MB)" min_time AS "First time seen" max_time AS "Last time seen"

Any ideas how to improve it?

0 Karma
1 Solution

DalJeanis
Legend

Okay, I've broken it up into two chunks and coded it very specifically in steps. Run the first chunk, if it gives you any output, then add the second chunk. Let us know where it falls out. If it doesn't, then one of the pieces I changed may work differently from how you think it does.

 index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
 | stats max(data.mem_used) AS peak_mem_usage, 
        latest(data.search_props.mode) AS mode, 
        latest(data.search_props.type) AS type, 
        latest(data.search_props.role) AS role, 
        latest(data.search_props.app) AS app, 
        latest(data.search_props.user) AS user, 
        min(_time) AS min_time 
        max(_time) AS max_time 
        by data.search_props.sid
 | sort - peak_mem_usage 
 | head 20

 | table data.search_props.sid peak_mem_usage mode type role app user min_time max_time
 | convert ctime(min_time)
 | convert ctime(max_time)
 | rename data.search_props.sid AS SID, 
    peak_mem_usage AS "Peak Physical Memory Usage (MB)",
    min_time AS "First time seen",
    max_time AS "Last time seen"

I suspect that ...

| stats ... latest(data.search_props.*) AS *, .... by data.search_props.sid

...might be an issue, since the by field is part of the *, but I've also changed some other details so that each step is extremely specific. Maybe less efficient, but if anything drops out here, you can add one line at a time and the problem should turn out to be obvious (in retrospect).

View solution in original post

aaraneta_splunk
Splunk Employee
Splunk Employee

@ddrillic - Did the answer provided by DalJeanis help provide a working solution to your question? If yes, please don't forget to resolve this post by clicking "Accept". If no, please leave a comment with more feedback. Thanks!

0 Karma

DalJeanis
Legend

Okay, I've broken it up into two chunks and coded it very specifically in steps. Run the first chunk, if it gives you any output, then add the second chunk. Let us know where it falls out. If it doesn't, then one of the pieces I changed may work differently from how you think it does.

 index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
 | stats max(data.mem_used) AS peak_mem_usage, 
        latest(data.search_props.mode) AS mode, 
        latest(data.search_props.type) AS type, 
        latest(data.search_props.role) AS role, 
        latest(data.search_props.app) AS app, 
        latest(data.search_props.user) AS user, 
        min(_time) AS min_time 
        max(_time) AS max_time 
        by data.search_props.sid
 | sort - peak_mem_usage 
 | head 20

 | table data.search_props.sid peak_mem_usage mode type role app user min_time max_time
 | convert ctime(min_time)
 | convert ctime(max_time)
 | rename data.search_props.sid AS SID, 
    peak_mem_usage AS "Peak Physical Memory Usage (MB)",
    min_time AS "First time seen",
    max_time AS "Last time seen"

I suspect that ...

| stats ... latest(data.search_props.*) AS *, .... by data.search_props.sid

...might be an issue, since the by field is part of the *, but I've also changed some other details so that each step is extremely specific. Maybe less efficient, but if anything drops out here, you can add one line at a time and the problem should turn out to be obvious (in retrospect).

lim2
Communicator

Thanks DalJeanis for useful query. FYI, query worked for me after I changed to use index=_introspection from index=_internal

0 Karma
Get Updates on the Splunk Community!

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

The latest enhancements across the Splunk Observability portfolio deliver greater flexibility, better data and ...

Alerting Best Practices: How to Create Good Detectors

At their best, detectors and the alerts they trigger notify teams when applications aren’t performing as ...

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

Hey Splunky people! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2408. In this ...