Splunk Search

How to improve my search to identify queries which consume a large amount of memory?

ddrillic
Ultra Champion

We had recently Search Heads crashing and it seems that queries which consume 11-12 GBs of memory cause the crashes.

We are trying the following search but it returns 0 results for the past week -

index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
| stats max(data.mem_used) AS peak_mem_usage latest(data.search_props.*) AS * min(_time) AS min_time max(_time) AS max_time by data.search_props.sid
| sort 20 - peak_mem_usage
| fields data.search_props.sid peak_mem_usage mode type role app user min_time max_time
| convert ctime(min_time)
| convert ctime(max_time)
| rename data.search_props.sid AS SID peak_mem_usage AS "Peak Physical Memory Usage (MB)" min_time AS "First time seen" max_time AS "Last time seen"

Any ideas how to improve it?

0 Karma
1 Solution

DalJeanis
Legend

Okay, I've broken it up into two chunks and coded it very specifically in steps. Run the first chunk, if it gives you any output, then add the second chunk. Let us know where it falls out. If it doesn't, then one of the pieces I changed may work differently from how you think it does.

 index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
 | stats max(data.mem_used) AS peak_mem_usage, 
        latest(data.search_props.mode) AS mode, 
        latest(data.search_props.type) AS type, 
        latest(data.search_props.role) AS role, 
        latest(data.search_props.app) AS app, 
        latest(data.search_props.user) AS user, 
        min(_time) AS min_time 
        max(_time) AS max_time 
        by data.search_props.sid
 | sort - peak_mem_usage 
 | head 20

 | table data.search_props.sid peak_mem_usage mode type role app user min_time max_time
 | convert ctime(min_time)
 | convert ctime(max_time)
 | rename data.search_props.sid AS SID, 
    peak_mem_usage AS "Peak Physical Memory Usage (MB)",
    min_time AS "First time seen",
    max_time AS "Last time seen"

I suspect that ...

| stats ... latest(data.search_props.*) AS *, .... by data.search_props.sid

...might be an issue, since the by field is part of the *, but I've also changed some other details so that each step is extremely specific. Maybe less efficient, but if anything drops out here, you can add one line at a time and the problem should turn out to be obvious (in retrospect).

View solution in original post

aaraneta_splunk
Splunk Employee
Splunk Employee

@ddrillic - Did the answer provided by DalJeanis help provide a working solution to your question? If yes, please don't forget to resolve this post by clicking "Accept". If no, please leave a comment with more feedback. Thanks!

0 Karma

DalJeanis
Legend

Okay, I've broken it up into two chunks and coded it very specifically in steps. Run the first chunk, if it gives you any output, then add the second chunk. Let us know where it falls out. If it doesn't, then one of the pieces I changed may work differently from how you think it does.

 index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
 | stats max(data.mem_used) AS peak_mem_usage, 
        latest(data.search_props.mode) AS mode, 
        latest(data.search_props.type) AS type, 
        latest(data.search_props.role) AS role, 
        latest(data.search_props.app) AS app, 
        latest(data.search_props.user) AS user, 
        min(_time) AS min_time 
        max(_time) AS max_time 
        by data.search_props.sid
 | sort - peak_mem_usage 
 | head 20

 | table data.search_props.sid peak_mem_usage mode type role app user min_time max_time
 | convert ctime(min_time)
 | convert ctime(max_time)
 | rename data.search_props.sid AS SID, 
    peak_mem_usage AS "Peak Physical Memory Usage (MB)",
    min_time AS "First time seen",
    max_time AS "Last time seen"

I suspect that ...

| stats ... latest(data.search_props.*) AS *, .... by data.search_props.sid

...might be an issue, since the by field is part of the *, but I've also changed some other details so that each step is extremely specific. Maybe less efficient, but if anything drops out here, you can add one line at a time and the problem should turn out to be obvious (in retrospect).

lim2
Communicator

Thanks DalJeanis for useful query. FYI, query worked for me after I changed to use index=_introspection from index=_internal

0 Karma
Get Updates on the Splunk Community!

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Unleash Unified Security and Observability with Splunk Cloud Platform

     Now Available on Microsoft AzureThursday, March 27, 2025  |  11AM PST / 2PM EST | Register NowStep boldly ...

Splunk AppDynamics with Cisco Secure Application

Web applications unfortunately present a target rich environment for security vulnerabilities and attacks. ...