Scenario:
Anyone using a SHC (Search Head Cluster) implements apps from the Deployer. The deployer collapses the local and default config directories into default and pushes the config to the SHC members.
After normal usage, some of the knowledge objects in the app have evolved (like a savedsearch or a macro has been modified).
Eventually a new version of the app comes out and it has a butt kickin' nice new version of that very knowledge object. So, I stage the new version of the app on the deployer and push it out.
Unfortunately, the local folder edit of the knowledge object still takes precedent and the sweet new version (sitting in the default directory on the SHC members) is ignored.
How do we eliminate our version of the config and revert back to the one in the local directory?
Since we can't delete the edited version of the knowledge object from the UI, and we can't manually edit the conf file, what is the recommended way to address this?
More detail:
If you use the Deployer to send the splunk_app_aws to your Search Head Cluster you'll then have a bunch of cool knowledge objects that you can edit. Let's pretend I want to edit aws-accesslog-sourcetype(1)
to adjust it for my environment. Before the edit, this config lives ONLY in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf
on the Search Head Cluster Members. I make the my change in the UI and the result is this:
Notice my new definition of blah
but no way to delete or revert back. There is now a corresponding version of this macro on the Search Head Cluster Members in $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf
defined as blah
.
Now let's pretend that after some time, I want to remove my change and go back to the version provided in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf
- with a single search head OR a search head pool, I can simply remove the corresponding stanza in $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf
with a text editor and restart the instance thereby allowing the version in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf
to take affect.
Unfortunately, you cannot make manual edits to configuration in a Search Head Cluster. So is there a parallel way to remove your $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf
version in a Search Head Cluster?
This thread was helpful and particularly thanks to @jkat54. I had a similar problem at a client (we're both Splunk PS) and we could not remove completely a legacy app from the SHC, order of steps and the resolution:
A further step which may be useful: I encountered the same issue as this but the above steps were NOT successful in removing the associated app. The exact reason is still unclear, the only hints I have found were an event log generated during the bundle push, which you can find with the following search:
index=_internal sourcetype=splunkd_conf component=confdeployment "data.task"=downloadDeployableApps
The event is a JSON object containing an element for each app; built in apps that do not get pushed by the deployer will have a value data.apps{}.action=preserved, ones which are pushed will have data.apps{}.action=matched.
In this case, the app showed as preserved, despite not being one of the built in defaults. Cluster members would not delete it even though all local configs were gone, and the app had been removed from the deployer's bundle, and neither debug/refresh nor rolling restart helped.
The eventual solution was:
Glad I could help, what was "RR" though?
RR = Rolling Restart of the SHC.
We took the approach of:
debug refresh > Verify > Rolling Restart.
Obviously it is less desirable to carry out a RR so we tried to avoid and ensured it was searchable throughout. Obviously if faced with the problem again we would have deleted the local, default.oldxxxx, <user> local files and only then undertaken a debug refresh, verify, then RR.
You can use deployer_push_mode = full | merge_to_default | local_only | default_only
You have add it under [shclustering] stanza in app.conf of the app you are deploying. I think you have to have Splunk 7.3.0 and higher.
Here's the doc.
https://docs.splunk.com/Documentation/Splunk/8.0.1/DistSearch/PropagateSHCconfigurationchanges
What we do is disallow (by policy mostly) editing the production version of the apps. Users can work in personal space but not promote anything into the production apps on the production Search Head. All development is done on a dev search head and we have a packaging script that does stuff like set the version number in app.conf
to YYYY.MM.DD
, enable scheduled searches that are disabled but have a "ready for production" string in the description
section, removes backups from the Lookup File Editor
, etc. So we dev in dev, and extract from dev, productionize with a script, then push out from the deployer.
Hmm. That sounds more about how to handle source control. Any nuance to that approach for how to remove SHC config? Like the private stuff?
We don't care about private stuff. If you bork your Splunk, that's your problem. If you bork SOMEBODY ELSE'S Splunk, then we have a problem.
he he.....
Since the KO sync happens after you press save in the UI, you can delete the KO manually at any time from each search head and hit debug/refresh endpoint on each SH after the change.
I’ve done this many times without issue.
The debug/refresh trick doesn't seem to work for <app>/local/savedsearch.conf deletions.
At least not for me on 8.1.1.
I assume speed is important here. If you are slow to modify each SHCmember I would expect sync issues to percolate. This is only important here because the SHs are running and talking to the captain and each other.
If the instances are offline that won't be of concern.
But yes, this approach makes sense as working so thanks for sharing!
No, timing doesnt really matter.
You're deleting the local copy, which will then still be referenced in memory and doesnt trigger a replication to occur because the file is missing on the filesystem not in memory.
So then, if you delete the local copy on all your search heads, and then proceed to hit debug/refresh, no replication occurs. You havent made any edits or created new config files to cause a replication event to be triggered.
If you need to see for yourself, read it here:
https://docs.splunk.com/Documentation/Splunk/8.0.0/DistSearch/HowconfrepoworksinSHC
"The cluster does not replicate any configuration changes that you make manually, such as direct edits to configuration files."
Yes. I was able to do this quickly, as we have only three search heads at this time. As we grow, this will be more difficult.
Thanks. As I posted above, I ended up stopping Splunk and manually deleting. I'll try debug/refresh next time.
This is always a problem with Splunk whereby from Deployer it pushes to "default" in SH members. What we do is
- Strict Control for End Users by process and roles: They cannot create dashboards/config elements on Apps. But only on personal space. If anything to be deployed to App, it should come to Splunk Platform team and will be deployed via deployer
- In case if you have a local entry, the splunk engineer should merge this after carefully looking into source-control + deployer. Then sh-deploy and wait for it to finish. Then STOP all SH members , delete it from all at same time from local, clear raft. Restart and redeploy again from deployer to make it 100% sure.
But if you have this problem already, the only reliable workaround I've found is
- STOP all the search heads at the approximate time.
- Take backup of files in "local" directory. i.e. /opt/splunk/etc/apps//local/.conf
- Ensure all SEarch heads are stopped .( Yes, I know this is impacting.. but hey.. its SHC pain)
- Remove the local file itself from all search heads. Dont edit it.
- Start search heads and do a redeploy from deployer
I dont want to downvote this but then i do. There's no need to stop the entire SHC to perform this maintenance. Instead, please see my answer here on this same question.
please note, this was written in 2017 when it was Splunk 6.2.x. At that time, if you fix on one server at a time, by the time you restart the other, the replicated settings used to come and reset the work you have done. Hence had to stop all
I'm not sure how it is with Splunk 7.x or 8.x
You don't have to stop them all in any version of splunk with SHC.
I've used my method since SHC was GA.
I didn't downvote but your answer causes undue harm if anyone follows it. That's like biggest criteria for a downvote but you're koshyk and I'd rather give you the opportunity to remove or modify.
The ways I can think of would be to either:
1. Stop all SHC members, remove the local folder for the app, start them up again.
2. Move the app off the SHC deployer, let the SHC delete the app from themselves, redeploy.
3. I've never tried this, but maybe it could work: instead of using the SHC deployer, deploy everything to the SHC captain from a regular deployment server or manually and see if it'll replicate everything across the rest of the members anyways.