We had recently Search Heads crashing and it seems that queries which consume 11-12 GBs of memory cause the crashes.
We are trying the following search but it returns 0 results for the past week -
index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
| stats max(data.mem_used) AS peak_mem_usage latest(data.search_props.*) AS * min(_time) AS min_time max(_time) AS max_time by data.search_props.sid
| sort 20 - peak_mem_usage
| fields data.search_props.sid peak_mem_usage mode type role app user min_time max_time
| convert ctime(min_time)
| convert ctime(max_time)
| rename data.search_props.sid AS SID peak_mem_usage AS "Peak Physical Memory Usage (MB)" min_time AS "First time seen" max_time AS "Last time seen"
Any ideas how to improve it?
Okay, I've broken it up into two chunks and coded it very specifically in steps. Run the first chunk, if it gives you any output, then add the second chunk. Let us know where it falls out. If it doesn't, then one of the pieces I changed may work differently from how you think it does.
index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
| stats max(data.mem_used) AS peak_mem_usage,
latest(data.search_props.mode) AS mode,
latest(data.search_props.type) AS type,
latest(data.search_props.role) AS role,
latest(data.search_props.app) AS app,
latest(data.search_props.user) AS user,
min(_time) AS min_time
max(_time) AS max_time
by data.search_props.sid
| sort - peak_mem_usage
| head 20
| table data.search_props.sid peak_mem_usage mode type role app user min_time max_time
| convert ctime(min_time)
| convert ctime(max_time)
| rename data.search_props.sid AS SID,
peak_mem_usage AS "Peak Physical Memory Usage (MB)",
min_time AS "First time seen",
max_time AS "Last time seen"
I suspect that ...
| stats ... latest(data.search_props.*) AS *, .... by data.search_props.sid
...might be an issue, since the by field is part of the *, but I've also changed some other details so that each step is extremely specific. Maybe less efficient, but if anything drops out here, you can add one line at a time and the problem should turn out to be obvious (in retrospect).
@ddrillic - Did the answer provided by DalJeanis help provide a working solution to your question? If yes, please don't forget to resolve this post by clicking "Accept". If no, please leave a comment with more feedback. Thanks!
Okay, I've broken it up into two chunks and coded it very specifically in steps. Run the first chunk, if it gives you any output, then add the second chunk. Let us know where it falls out. If it doesn't, then one of the pieces I changed may work differently from how you think it does.
index=_internal host="host name" sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*
| stats max(data.mem_used) AS peak_mem_usage,
latest(data.search_props.mode) AS mode,
latest(data.search_props.type) AS type,
latest(data.search_props.role) AS role,
latest(data.search_props.app) AS app,
latest(data.search_props.user) AS user,
min(_time) AS min_time
max(_time) AS max_time
by data.search_props.sid
| sort - peak_mem_usage
| head 20
| table data.search_props.sid peak_mem_usage mode type role app user min_time max_time
| convert ctime(min_time)
| convert ctime(max_time)
| rename data.search_props.sid AS SID,
peak_mem_usage AS "Peak Physical Memory Usage (MB)",
min_time AS "First time seen",
max_time AS "Last time seen"
I suspect that ...
| stats ... latest(data.search_props.*) AS *, .... by data.search_props.sid
...might be an issue, since the by field is part of the *, but I've also changed some other details so that each step is extremely specific. Maybe less efficient, but if anything drops out here, you can add one line at a time and the problem should turn out to be obvious (in retrospect).
Thanks DalJeanis for useful query. FYI, query worked for me after I changed to use index=_introspection from index=_internal