After a recent upgrade to 7.1, my Search Head (not a SH Cluster) no longer seems to be running saved searches. _internal shows all searches being skipped:
05-02-2018 18:00:02.391 +1000 INFO SavedSplunker - Skip search "Test Search" during searchable rolling process with nextRunTime=1525248000 04-28-2018 09:59:02.103 +1000 INFO SavedSplunker - Skip search "test2" during searchable rolling process with nextRunTime=1524873540
And so on. Those are just two examples during the last few days after the "upgrade."
The Search Head has distributed search enabled, and talks to three cluster-masters; however, the saved searches that are failing are failing regardless of where the data is. Some of the cluster indexers are 7.0.2, but, my Test Searches are against indexers that are 7.1 and working fine for actual searches -- it's only the saved search executing in the background and then generating email alerts (as an example) that is failing.
I can run the searches just fine, get results, dashboards load, but, saving the search and expecting actions to trigger fails.
And this isn't the list_settings issue of 6.6 for permissions; that was already created, and there don't seem to be any errors related to the ability to send mail, it's just as if the scheduler isn't working.
I should add, the console itself has messages along the lines of:
Indexer Clustering: Search SavedSearch created by username on the fubar app was skipped during the searchable rolling restart or upgrade.
Obviously, the name of the search, the user, and the app varies, but, these have been constant. It's almost as if the search head thinks that it can't run any saved searches due to the cluster's status - even though things are working just fine for queries and there doesn't seem to be any rolling restarts actually taking place.
Any thoughts appreciated!
7.0.2, and, indeed, that is related to the failure, but, only partially. It turns out that 7.1 wouldn't even run saved searches against 7.1 distributed peers. No rolling restarts/upgrades were taking place. For reasons that are unclear, setting DEBUG mode on the log level for DistributedPeerManager would actually FIX the problem we were having. Setting the log level to any level other than DEBUG and no saved searches would run. We have since replaced our servers with 7.0.2 and will skip 7.1 - was not worth the hassle.
Setting log level to DEBUG for DistributedPeerManager resulted in saved searches running as normal. 7.1 as a Search Head does not play with any version other than 7.1. Even talking to non-cluster peers was unstable and did not result on saved searches running. Moved to 7.0.2 and will avoid 7.1
In case the search head is also the part of indexer clusters and needs talk to multiple cluster masters which has lower version than the search head on it, it could happen.
e.g the higher version of search-head(7.1) cannot be talking to 7.0.2 cluster masters as expected or properly.
The search head thinks a rolling restart is happening because the messages it gets from the 7.0.2 cluster masters is missing crucial data.
We will work in improving the error message the search head reports(…skipped during the searchable rolling restart or upgrade) so we can have better messages or clear understanding on it.
Splunk Enterprise version compatibility
Interoperability between the various types of cluster nodes is subject to strict compatibility requirements. In brief:
The master node must run the same or a later version from the peer nodes and search heads.
The search heads must run the same or a later version from the peer nodes.
The peer nodes must all run exactly the same version, down to the maintenance level.