Last week, we had someone run a query in which he had "index=*" over 1 week.
This triggered a surge of memory usage to 100% in half of our indexers which ended in their complete failure.
I would like to know if anybody has any idea how to block any such query from running.
I searched all over the place without finding anything. I know that you can specify the indexes which can be seen by roles, but, that's not what I want. The documentation lists "wildcards" as what can also be filtered, but there is no example. Anyway I suspect that it would filter all the searches with wildcards, which is not what I am looking for.
Another little precision: the "Restrict search terms" field which you can add to roles will not do, since it only adds an admin filter to the user's query. It will not filter search query terms containing wildcards.
After 1.5 years, still no answer!
Seeing this and the interest in this issue I decided that I should share the next best solution, which I came up a short while after posting this question. It's an alert which will trigger whenever anyone without admin role runs a query which contains index=* over a period greater than 24 hours. Knowing this allows you to inform users of their mistake and ask them to not run such queries. This alert query is smart enough to see the excessive time in most cases. Since the answer is sometimes to difficult to figure out, the results will show a qualifier True, False, or Unknown for each case, thus leaving you to verify and decide whether you need to intervene or not.
Here is my INDEX_EQUAL_ASTERISK query:
index=_internal host=Your.Search.Head.Names search earliest NOT user=- NOT [search earliest=-65m latest=-15m index=_internal host=Your.Search.Head.Names search earliest NOT user=- | dedup sid | fields sid] NOT [|rest /services/authentication/users splunk_server=local | rename title as user | regex roles="admin*" | fields user] | regex "index%3D\*%" | regex _raw!="earliest(%3D-5m%)|(%3D-15m%)|(=-5m)|(=-15m)" | rex "(earliest%3D(?<EarliestDate>([0-9]{2}%2F){2}[0-9]{4}(%3A[0-9]{1,2}+){3}))|(earliest(=|%3D)(?<Earliest>(-(?>[0-9]+)[^hms+])|(-(?>[2-9][5-9])h)|(-(?>[3-9][0-9])h)|(-(?>[0-9]{3,})h)|([0-9]+)))" |
rex "(latest%3D(?<LatestDate>([0-9]{2}%2F){2}[0-9]{4}(%3A[0-9]{1,2}+){3}))|(latest(=|%3D)(?<Latest>(-(?>[0-9]+)[^hms])|(-(?>[2-9][5-9])h)|(-(?>[3-9][0-9])h)|(-(?>[0-9]{3,})h)|([0-9]+)))" |
rex "sid=(?<SearchID>[0-9]+\.[0-9]+)" |
eval EarliestDate=replace(EarliestDate, "%2F", "/") | eval EarliestDate=replace(EarliestDate, "%3A", ":") |
eval LatestDate=replace(LatestDate, "%2F", "/") | eval LatestDate=replace(LatestDate, "%3A", ":") | eval "> 1 day"=if( isNull(EarliestDate),if(like(Earliest,"-%") OR (tonumber(Latest)-tonumber(Earliest))>86400, "True", "False"), "Unknown") | eval Earliest=if(isNull(EarliestDate), Earliest, EarliestDate) | eval Latest=if(isNull(LatestDate), Latest, LatestDate) | eval Latest=if(isNull(Latest), "now", Latest) | where len(Earliest) > 2 AND len(Latest) > 2 | stats count as "Logs Count" by user, SearchID, Earliest, Latest, "> 1 day"
Here is the scheduled search setup:
And, finally, here is a sample alert email you will get when an alert is detected:
Not sure whether it's applicable in your case, but we associate each role with a certain index. So, in most cases, each user can access their own index only.
Sorry but no, it's not applicable.
Although we have some roles with limited access, a large percentage of our users need to have access to almost all indexes.
After 1.5 years, still no answer!
Seeing this and the interest in this issue I decided that I should share the next best solution, which I came up a short while after posting this question. It's an alert which will trigger whenever anyone without admin role runs a query which contains index=* over a period greater than 24 hours. Knowing this allows you to inform users of their mistake and ask them to not run such queries. This alert query is smart enough to see the excessive time in most cases. Since the answer is sometimes to difficult to figure out, the results will show a qualifier True, False, or Unknown for each case, thus leaving you to verify and decide whether you need to intervene or not.
Here is my INDEX_EQUAL_ASTERISK query:
index=_internal host=Your.Search.Head.Names search earliest NOT user=- NOT [search earliest=-65m latest=-15m index=_internal host=Your.Search.Head.Names search earliest NOT user=- | dedup sid | fields sid] NOT [|rest /services/authentication/users splunk_server=local | rename title as user | regex roles="admin*" | fields user] | regex "index%3D\*%" | regex _raw!="earliest(%3D-5m%)|(%3D-15m%)|(=-5m)|(=-15m)" | rex "(earliest%3D(?<EarliestDate>([0-9]{2}%2F){2}[0-9]{4}(%3A[0-9]{1,2}+){3}))|(earliest(=|%3D)(?<Earliest>(-(?>[0-9]+)[^hms+])|(-(?>[2-9][5-9])h)|(-(?>[3-9][0-9])h)|(-(?>[0-9]{3,})h)|([0-9]+)))" |
rex "(latest%3D(?<LatestDate>([0-9]{2}%2F){2}[0-9]{4}(%3A[0-9]{1,2}+){3}))|(latest(=|%3D)(?<Latest>(-(?>[0-9]+)[^hms])|(-(?>[2-9][5-9])h)|(-(?>[3-9][0-9])h)|(-(?>[0-9]{3,})h)|([0-9]+)))" |
rex "sid=(?<SearchID>[0-9]+\.[0-9]+)" |
eval EarliestDate=replace(EarliestDate, "%2F", "/") | eval EarliestDate=replace(EarliestDate, "%3A", ":") |
eval LatestDate=replace(LatestDate, "%2F", "/") | eval LatestDate=replace(LatestDate, "%3A", ":") | eval "> 1 day"=if( isNull(EarliestDate),if(like(Earliest,"-%") OR (tonumber(Latest)-tonumber(Earliest))>86400, "True", "False"), "Unknown") | eval Earliest=if(isNull(EarliestDate), Earliest, EarliestDate) | eval Latest=if(isNull(LatestDate), Latest, LatestDate) | eval Latest=if(isNull(Latest), "now", Latest) | where len(Earliest) > 2 AND len(Latest) > 2 | stats count as "Logs Count" by user, SearchID, Earliest, Latest, "> 1 day"
Here is the scheduled search setup:
And, finally, here is a sample alert email you will get when an alert is detected:
Responding to this bc we recently received the same question at Community Office Hours. In case anyone else is looking to do this, here's the expert guidance:
Hello @sansay! That's an elegant solution and I'm glad you were able to solve your problem. You are spot-on in your response: the key is to find problem users and educate them on best practices.
Have you seen the Search Activity app? This provides a granular view of how users are using Splunk. There are a lot of searches there which you can adapt in a similar way as above.
Thank you Tom 🙂
No, I didn't know about the Search Activity app. I must admit I don't spend any time looking in the apps to see if there is something I could use. Way too often we deal with the situation having too much to do to spend some time to try improving by using others creative work. Thank you for bringing this app to my attention. I will certainly take a look at it.
Sorry for the terrible answers. Good luck.