Two of our users reported that they have not been getting any alerts from their real time searches over the past week.
I created a new real time search to test with and it looks like it is also not running. Scheduled searches are running and alerting without any problems. I know real time searches are resource hogs, and I have already made suggestions to use scheduled searches that are scheduled for every 5 minutes or so in place of real time searches. But there are some use cases where they insist a real time search is essential, and I definitely want to figure out what is going on here...
I first wanted to see how many real time searches we have in our environment. I was not able to find any documentation on this, so went to the job queue and filtered on status=Running. After analyzing the jobs, I found that ten of them were real time searches - and neither the two real time searches my users reported to me nor my new test real time search showed up in this list. I got to thinking that we have hit the limit of real time searches allowed (although I do not see any reference to this when searching index=_internal source=*scheduler.log) So as far as I can tell there is no easy way to get a list of searches configured for real time.
Based on limits.conf documentation, I went to calculate our max real time searches allowed. We have not adjusted any of the default settings in our environment. We have a Search Head Pool consisting of two VMs - each with 4 CPUs:
max_hist_searches = 1 x number_of_cpus + base_max_searches
max_hist_searches = 1 x 4 + 6 = 10
max_rt_searches = max_rt_search_multiplier x max_hist_searches
max_rt_searches = 1 X 10 = 10
So it seems we should be able to have 10 real time searches running at a time. Which is exactly how many I see in the job queue... BUT since we have two Search Heads in a pool, shouldn't this number be doubled to 20 max real time searches?
I pulled up the job queue on each Search Head and the same real time searches appear on both. I know that is because of the pooling, but shouldn't there be 20 max real time searches allowed because of the two search heads?
Also, looking in the S.o.S. app, in Scheduler activity, the saved searches that are not running do not appear in the drop-down list of searches to filter on.
Our current Splunk infrastructure (with Search Head Pooling) has been around for several years, and we are planning to move to Search Head Clustering next spring when we stand up our new Splunk environment... but until then I still need to support our search head pooling.
I ran the search you provided, but not sure that it will help me solve my problem. I need to find out why real time searches seem to not be running (due to absence of alerts from them, and not seeing them in the job queue.) I have turned debugging messages on for the scheduler, but that has not produced anything helpful. I also wanted to see if I could find a list of all real time searches configured so I could try to persuade the owners to change them to saved searches. The closest I have been able to come is the search index=_internal "SavedSplunker" "realtime=1" - where I assumed "realtime=1" would indicate the search is a real time search, but when looking at the searches returned settings, I see I was wrong because they are not all real time searches (start time/finish time is not rt, and they are configured to run on a cron schedule)
Prior to this issue, I have seen the need for additional Search Heads, so am in the process of standing up two more SHs, I am just not sure this will address the real time search/alert issue since there only seems to be 10 RT searches running currently with the two SHs when my calculations show that each SH should be able to handle 10 RT searches. A search of the _internal logs does not show any reference to search limit reached though.