Solved: How to prevent a user from running a high memory-u...

Kieffer87 · ‎07-24-2017

We've recently run into some users that have run searches which resulted in Splunk Indexers crashing. I'm looking for some suggestions to A. prevent a user from running a high memory usage search and B. understand what it is about this search that consumed that much memory.

We have 8 indexers clustered running CentOS and Splunk 6.5.3. Each one has 128GB ram and a 24 disk RAID10 array. During normal usage the indexers consume roughly 6GB of memory. The following search query consumed just under 120GB of ram on each indexer and crashed one due to running out of swap space.

sourcetype="infoblox:dns" NOT (query="*akamai*" OR query="*qwest*" OR query="*google*" OR query="*bing*"  OR query="*cloudfront*" OR query="*amazonaws.com" OR query="*microsoft*" OR query="*mcafee*" OR query="*deere*" OR query="*.arpa")  | eval domain= mvindex(split(query,"."),-3) + "." + mvindex(split(query,"."),-2) + "." + mvindex(split(query,"."),-1) | stats count by domain  | sort - count domain | table domain

The indicators I see which may have contributed to this are:

Not specifying an index, though the sourcetype infoblox:dns belongs to only 1 index. (not sure if this would matter)
Using NOT rather than AND ( I recall seeing somewhere that NOT is more resource intensive)
Using wildcards in each NOT statement
The EVAL statement

Is there a better way to formulate this search to reduce memory usage? What can I do in the future to prevent a user from repeating this and bringing down the system?

DalJeanis · ‎07-24-2017

All right, the first thing to do is to look at that sourcetype and see what values are in the query field.

Then, take ONE of those query="blahblah" clauses and figure out how a computer would have to compare in order to determine if a record should be included. The system is going to have to scan the entire record ten times, remembering all the places it might have to backtrack to.

Try this -

  sourcetype="infoblox:dns" 
 | rex field=query "(?<rejectme>akamai|amazonaws\.com|bing|cloudfront|deere|google|mcafee|microsoft|qwest|\.arpa)"
 | where isnull(rejectme)

 | rename COMMENT as "Not sure whether the below is appropriate or not, without seeing sample data."
 | eval query = split(query,".")
 | eval domain= mvindex(query,-3) + "." + mvindex(query,-2) + "." + mvindex(query,-1) 
 | stats count by domain  
 | sort - count domain 
 | table domain

I suspect that there is a better way to determine the domain, but I'd need sample data to have a clue what it might be.

View solution in original post

DalJeanis · ‎07-24-2017

All right, the first thing to do is to look at that sourcetype and see what values are in the query field.

Then, take ONE of those query="blahblah" clauses and figure out how a computer would have to compare in order to determine if a record should be included. The system is going to have to scan the entire record ten times, remembering all the places it might have to backtrack to.

Try this -

  sourcetype="infoblox:dns" 
 | rex field=query "(?<rejectme>akamai|amazonaws\.com|bing|cloudfront|deere|google|mcafee|microsoft|qwest|\.arpa)"
 | where isnull(rejectme)

 | rename COMMENT as "Not sure whether the below is appropriate or not, without seeing sample data."
 | eval query = split(query,".")
 | eval domain= mvindex(query,-3) + "." + mvindex(query,-2) + "." + mvindex(query,-1) 
 | stats count by domain  
 | sort - count domain 
 | table domain

I suspect that there is a better way to determine the domain, but I'd need sample data to have a clue what it might be.

Kieffer87 · ‎07-25-2017

This seems to have done the trick. When I was testing yesterday running the search for the past hour crashed one of my indexers. I was able to use your suggestion and run a search for the past hour and memory usage on the indexers didn't go above 6%. I'm also not seeing my search on the high memory usage searches 🙂

I did make one tweak, I added (?i) to the regex so that it isn't case sensitive.

| rex field=query "(?i)(?<rejectme>akamai|amazonaws\.com|bing|cloudfront|deere|google|mcafee|microsoft|qwest|\.arpa)"

DalJeanis · ‎07-25-2017

Good job!

Normally I do a max_match on the rex, but in this case you don't care how many are matched - any one match results in rejecting the record.

Happy splunking!

ddrillic · ‎07-24-2017

What's the time frame of the search?

Write better searches

says -

How to prevent a user from running a high memory-use search and understand why this search consumed 120GB of memory on indexer(s)?

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?