A ticket has come across my desk today where a customer is getting different results from different search heads for a report.
After looking at the report, I see that there are easily 15 windows that are all running real-time historical searches.
What is your recommendation on how to teach this team and others how to get current data from their reports w/out breaking the system?
The messages drop down on a given search head is constantly displaying this type of message:
The maximum number of historical concurrent system-wide searches has been reached. current=12 maximum=12
Now, we can increase the max search limitation on any given search head VM, but that can only scale so much before the resources of the VM are expended.
Thanks for any guidance.
Agree with @twinspop and @raghav2384 I normally disable rtsearch capabilities and schedule search capabilities for users. Then they come to me to get things scheduled or use the rtsearch capability. That's when I review their search with them to make sure it is well formed and performant. Eventually they learn enough from me to be promoted to a power user where they can then have those capabilities and do the same for their team's users.
For existing rtsearches, you can probably switch them to run frequently (like search over 5min every 5min) and most owners are satisfied with that.
I don't think we've addressed "a customer is getting different results from different search heads for a report." Would you elaborate on that?
Thanks for the response.
My assumption (I know, dangerous) is that because different SHs (9 of them) have different number of historical searches waiting for the queue to come down in order to run -- that is causing differing results.
Basically, the report is full of 15 - 20 panels running RT searches running back 1 hour - now -- which is causing the SH to constantly have some searches in the queue.
I completely agree with @twinspop . Control the access by roles (No realtime, no scheduled realtime...can run only xyz MB, can run only xyz searches etc). You are correct, increasing the max historical concurrent limit will not work unless you get the VM specs bumped. I hope you have reservations on your VMs. If not, scale them horizontally or switch to metals.
Hope this helps!
My solution was easy: I've removed
schedule_rtsearch from their capabilities in
[role_somerole] <snip> rtsearch = disabled schedule_rtsearch = disabled <snip>
Splunk axiom #67: Realtime searching will be abused at every opportunity.