When using SSO with clustered search heads, users who lose SSO access leave behind knowledge objects and directories on the file system. I'm doing some work to clean these up. In order to be able to query the Splunk API for the full set of information, it's necessary to re-create the user so that Splunk will see and return information about their private knowledge objects. While doing this, I noticed the following problem:
I suspected that maybe the search head cluster needs to sync some configuration, so I hit /services/shcluster/status while (2) is failing repeatedly to get some info about the search head cluster state. None of the search heads specify they are out of sync, and the last conf replication time reset (indicating a configuration replication had happened), but the API was still returning 404s on saved searches for a few seconds.
Is there any way to know when it's "safe" to request information pertaining to a user? Is the /directory endpoint potentially affected by this? Are there other endpoints that may be affected in the same way?
One other thing I tested was querying /servicesNS/-/search/saved/searches/SEARCH_NAME, however it exhibited the same behavior. Not all users seem to behave this way, but the particular user in question had a couple knowledge objects of type "props-extract". It seems likely that re-adding those to the system is taking longer, and this added delay somehow affects their saved searches showing up.
You should be able to hit /debug/refresh first then the endpoint you're looking for. In SHC, if you're aiming directly at one server versus querying a load balancer VIP you shouldn't have to worry about propagation.
All my testing was against a single host (no load balancer involved) from localhost, which is why I'm puzzled. By propagation, I was thinking that perhaps once the user is enabled, some system process needed to atomically update the cluster (due to the props-extract configs, which none of the other users that seem to work fine have) before it would "activate", much like cluster bundle config pushes, and that I was hitting things before that happened.
I'll see if /debug/refresh makes a difference tomorrow, since the problem seems to be consistently reproducible with this one particular user.
/debug/refresh is a web ui thing, not a REST API thing. I found the comment below this answer: https://answers.splunk.com/answering/661816/view.html and gave it a try, but the same behavior is still exhibited (I have to retry multiple times to get the expected data)