When I do a health check I get a warning that the skip ratio for scheduled searches is 96.4%. Upon further digging, and checking Search Activity: Instance it shows a skip ratio of 96.3 %. I have run the search
index=_internal source=*scheduler.log | stats count by user, app, savedsearch_name, status and the results returned was a high number of scheduled searches from the app Cisco Security Suite that were skipped. I changed the
[scheduler]max_searches_per cpu in the limits.conf file to 35. Is there anything else that I can do?
CPU: 16 Physical 32 Virtual
Memory: 262 GB
what is the reason for the skipped searches?
index = internal skipped sourcetype=scheduler status=skipped | stats count by app searchtype reason
After running the search you provided, here is what is returned.
reason: The maximum number of concurrent auto-summarization searches on this instance has bee reached
reason: The maximum number of concurrent historical scheduled searches on this instance has been reached
The first one for the last 24hours has a count of 173834 and the second is 27.
I updated the [scheduler]maxsearchesper cpu to 25 and maxsearchesperc from 50 to 60.
i will be very careful changing these settings in limits.conf and would talk to Splunk PS before doing so.
in any case, something seems very odd, i installed that app many times on weaker indexers without any skipped searches issues. Is it a single splunk instance? do you run many realtime searches?
do you have Enterprise Security app installed as well?
You're right, my last instance of Splunk was on a server that had less resources and I never noticed this. Since I use it to monitor the security of my network, there may be a lot of real time searches going on, which I don't see. Of course, this is the first time I am using health check because it is 6.6 and I didn't notice it on 6.5.3. I'll check with Splunk PS to see if I messed something up by making those changes.
The Cisco Security Suite App is generating 80% of the accelerated searches that is creating the skipped searches.
I'm getting 100% Skip on something that i don't even know that i need "Itsi event grouping" this is on the indexer. I could care less as the indexer job isn't to do noteable events.
It's very frustrating trying to find good documentation around these skip ratios and why they are set this way out of the box with splunk apps.
I believe the bottom line is that the resources need to be there on the indexer so that there are minimal to no skips on searches. The ideal Splunk configuration is that you have an indexer and a search head. The search head does what it does and if there are enough resources there would be no skipping searches. This way the load is kept separate. So if you have an indexer/search head on one server, there needs to be a lot of resources as it is searches per physical core, not virtual and the more you have with more memory then the probability of skipped searches go down. If that is not the case, and there is no way around the physical cpu and memory issues, then what I have found is that you can modify the limits.conf file. It is not something that Splunk support recommends, but it helps. Here are the places that I have changed:
maxsearchesperc = 50 (default)
maxsearchesper_cpu = 1 (default)
I changed the maxsearchesperc to 60 and maxsearchespercpu to 10 to see if the skipping searches would go back to 0.00%. When it did I slowly lowered it down until I found a good point where there may be a small percentage skipped or none at all. I also changed the maxsearches_perc back down to 50 at that point and watched it. With all the apps that I have and what I need it to do,Right now the max searches per cpu is at 5. I am good with that. I may have to go to getting another server for a search head.
I hope this helps.