Solved: How to improve my search to identify queries which...

ddrillic · ‎02-14-2017

We had recently Search Heads crashing and it seems that queries which consume 11-12 GBs of memory cause the crashes.

We are trying the following search but it returns 0 results for the past week -

index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
| stats max(data.mem_used) AS peak_mem_usage latest(data.search_props.*) AS * min(_time) AS min_time max(_time) AS max_time by data.search_props.sid
| sort 20 - peak_mem_usage
| fields data.search_props.sid peak_mem_usage mode type role app user min_time max_time
| convert ctime(min_time)
| convert ctime(max_time)
| rename data.search_props.sid AS SID peak_mem_usage AS "Peak Physical Memory Usage (MB)" min_time AS "First time seen" max_time AS "Last time seen"

Any ideas how to improve it?

DalJeanis · ‎02-14-2017

Okay, I've broken it up into two chunks and coded it very specifically in steps. Run the first chunk, if it gives you any output, then add the second chunk. Let us know where it falls out. If it doesn't, then one of the pieces I changed may work differently from how you think it does.

 index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
 | stats max(data.mem_used) AS peak_mem_usage, 
        latest(data.search_props.mode) AS mode, 
        latest(data.search_props.type) AS type, 
        latest(data.search_props.role) AS role, 
        latest(data.search_props.app) AS app, 
        latest(data.search_props.user) AS user, 
        min(_time) AS min_time 
        max(_time) AS max_time 
        by data.search_props.sid
 | sort - peak_mem_usage 
 | head 20

 | table data.search_props.sid peak_mem_usage mode type role app user min_time max_time
 | convert ctime(min_time)
 | convert ctime(max_time)
 | rename data.search_props.sid AS SID, 
    peak_mem_usage AS "Peak Physical Memory Usage (MB)",
    min_time AS "First time seen",
    max_time AS "Last time seen"

I suspect that ...

| stats ... latest(data.search_props.*) AS *, .... by data.search_props.sid

...might be an issue, since the by field is part of the *, but I've also changed some other details so that each step is extremely specific. Maybe less efficient, but if anything drops out here, you can add one line at a time and the problem should turn out to be obvious (in retrospect).

View solution in original post

aaraneta_splunk · ‎03-19-2017

@ddrillic - Did the answer provided by DalJeanis help provide a working solution to your question? If yes, please don't forget to resolve this post by clicking "Accept". If no, please leave a comment with more feedback. Thanks!

DalJeanis · ‎02-14-2017

Okay, I've broken it up into two chunks and coded it very specifically in steps. Run the first chunk, if it gives you any output, then add the second chunk. Let us know where it falls out. If it doesn't, then one of the pieces I changed may work differently from how you think it does.

 index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
 | stats max(data.mem_used) AS peak_mem_usage, 
        latest(data.search_props.mode) AS mode, 
        latest(data.search_props.type) AS type, 
        latest(data.search_props.role) AS role, 
        latest(data.search_props.app) AS app, 
        latest(data.search_props.user) AS user, 
        min(_time) AS min_time 
        max(_time) AS max_time 
        by data.search_props.sid
 | sort - peak_mem_usage 
 | head 20

 | table data.search_props.sid peak_mem_usage mode type role app user min_time max_time
 | convert ctime(min_time)
 | convert ctime(max_time)
 | rename data.search_props.sid AS SID, 
    peak_mem_usage AS "Peak Physical Memory Usage (MB)",
    min_time AS "First time seen",
    max_time AS "Last time seen"

I suspect that ...

| stats ... latest(data.search_props.*) AS *, .... by data.search_props.sid

...might be an issue, since the by field is part of the *, but I've also changed some other details so that each step is extremely specific. Maybe less efficient, but if anything drops out here, you can add one line at a time and the problem should turn out to be obvious (in retrospect).

lim2 · ‎02-06-2019

Thanks DalJeanis for useful query. FYI, query worked for me after I changed to use index=_introspection from index=_internal

How to improve my search to identify queries which consume a large amount of memory?

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

Splunk Classroom Chronicles: Training Tales and Testimonials

Access Tokens Page - New & Improved