Deployment Architecture

What is the best way to remove local config in a Search Head Cluster?

Ultra Champion

Scenario:
Anyone using a SHC (Search Head Cluster) implements apps from the Deployer. The deployer collapses the local and default config directories into default and pushes the config to the SHC members.

After normal usage, some of the knowledge objects in the app have evolved (like a savedsearch or a macro has been modified).

Eventually a new version of the app comes out and it has a butt kickin' nice new version of that very knowledge object. So, I stage the new version of the app on the deployer and push it out.

Unfortunately, the local folder edit of the knowledge object still takes precedent and the sweet new version (sitting in the default directory on the SHC members) is ignored.

How do we eliminate our version of the config and revert back to the one in the local directory?

Since we can't delete the edited version of the knowledge object from the UI, and we can't manually edit the conf file, what is the recommended way to address this?

More detail:
If you use the Deployer to send the splunk_app_aws to your Search Head Cluster you'll then have a bunch of cool knowledge objects that you can edit. Let's pretend I want to edit aws-accesslog-sourcetype(1) to adjust it for my environment. Before the edit, this config lives ONLY in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf on the Search Head Cluster Members. I make the my change in the UI and the result is this:
alt text
Notice my new definition of blah but no way to delete or revert back. There is now a corresponding version of this macro on the Search Head Cluster Members in $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf defined as blah.

Now let's pretend that after some time, I want to remove my change and go back to the version provided in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf - with a single search head OR a search head pool, I can simply remove the corresponding stanza in $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf with a text editor and restart the instance thereby allowing the version in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf to take affect.

Unfortunately, you cannot make manual edits to configuration in a Search Head Cluster. So is there a parallel way to remove your $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf version in a Search Head Cluster?

Esteemed Legend

What we do is disallow (by policy mostly) editing the production version of the apps. Users can work in personal space but not promote anything into the production apps on the production Search Head. All development is done on a dev search head and we have a packaging script that does stuff like set the version number in app.conf to YYYY.MM.DD, enable scheduled searches that are disabled but have a "ready for production" string in the description section, removes backups from the Lookup File Editor, etc. So we dev in dev, and extract from dev, productionize with a script, then push out from the deployer.

0 Karma

Ultra Champion

Hmm. That sounds more about how to handle source control. Any nuance to that approach for how to remove SHC config? Like the private stuff?

0 Karma

Esteemed Legend

We don't care about private stuff. If you bork your Splunk, that's your problem. If you bork SOMEBODY ELSE'S Splunk, then we have a problem.

0 Karma

Ultra Champion

he he.....

0 Karma

SplunkTrust
SplunkTrust

Since the KO sync happens after you press save in the UI, you can delete the KO manually at any time from each search head and hit debug/refresh endpoint on each SH after the change.

I’ve done this many times without issue.

Ultra Champion

I assume speed is important here. If you are slow to modify each SHCmember I would expect sync issues to percolate. This is only important here because the SHs are running and talking to the captain and each other.
If the instances are offline that won't be of concern.
But yes, this approach makes sense as working so thanks for sharing!

0 Karma

SplunkTrust
SplunkTrust

No, timing doesnt really matter.

You're deleting the local copy, which will then still be referenced in memory and doesnt trigger a replication to occur because the file is missing on the filesystem not in memory.

So then, if you delete the local copy on all your search heads, and then proceed to hit debug/refresh, no replication occurs. You havent made any edits or created new config files to cause a replication event to be triggered.

If you need to see for yourself, read it here:

https://docs.splunk.com/Documentation/Splunk/8.0.0/DistSearch/HowconfrepoworksinSHC

"The cluster does not replicate any configuration changes that you make manually, such as direct edits to configuration files."

0 Karma

Engager

Yes. I was able to do this quickly, as we have only three search heads at this time. As we grow, this will be more difficult.

0 Karma

Engager

Thanks. As I posted above, I ended up stopping Splunk and manually deleting. I'll try debug/refresh next time.

0 Karma

Super Champion

This is always a problem with Splunk whereby from Deployer it pushes to "default" in SH members. What we do is
- Strict Control for End Users by process and roles: They cannot create dashboards/config elements on Apps. But only on personal space. If anything to be deployed to App, it should come to Splunk Platform team and will be deployed via deployer
- In case if you have a local entry, the splunk engineer should merge this after carefully looking into source-control + deployer. Then sh-deploy and wait for it to finish. Then STOP all SH members , delete it from all at same time from local, clear raft. Restart and redeploy again from deployer to make it 100% sure.

But if you have this problem already, the only reliable workaround I've found is
- STOP all the search heads at the approximate time.
- Take backup of files in "local" directory. i.e. /opt/splunk/etc/apps//local/.conf
- Ensure all SEarch heads are stopped .( Yes, I know this is impacting.. but hey.. its SHC pain)
- Remove the local file itself from all search heads. Dont edit it.
- Start search heads and do a redeploy from deployer

SplunkTrust
SplunkTrust

I dont want to downvote this but then i do. There's no need to stop the entire SHC to perform this maintenance. Instead, please see my answer here on this same question.

0 Karma

Super Champion

please note, this was written in 2017 when it was Splunk 6.2.x. At that time, if you fix on one server at a time, by the time you restart the other, the replicated settings used to come and reset the work you have done. Hence had to stop all
I'm not sure how it is with Splunk 7.x or 8.x

0 Karma

SplunkTrust
SplunkTrust

You don't have to stop them all in any version of splunk with SHC.

I've used my method since SHC was GA.

I didn't downvote but your answer causes undue harm if anyone follows it. That's like biggest criteria for a downvote but you're koshyk and I'd rather give you the opportunity to remove or modify.

0 Karma

Contributor

The ways I can think of would be to either:
1. Stop all SHC members, remove the local folder for the app, start them up again.
2. Move the app off the SHC deployer, let the SHC delete the app from themselves, redeploy.
3. I've never tried this, but maybe it could work: instead of using the SHC deployer, deploy everything to the SHC captain from a regular deployment server or manually and see if it'll replicate everything across the rest of the members anyways.

0 Karma

Ultra Champion

Yea, but I don't think any of us are really satisfied with those, right?

  1. I have a feeling this is likely to produce a sync issue with RAFT and would not really be supported
  2. Def would work, but before you move it off the deployer, you'd have to throw a copy of the SHC member's local folder on the version to get re-deployed from the deployer, otherwise you'd lose all the Knowledge Objects folks created
  3. This sounds like bad-news-bears and I'm sure any problems would not be supported.

Essentially, option 2 is the best but I've also submitted a feature request for this. Essentially, the non-SHC usage of splunk allowed this scenario to be addressed trivially and as such, no deliberate feature needed to be created to support it.

0 Karma

Engager

@SloshBurch, which option did you choose? I'm having the same issue, and am not sure how to proceed. I am tempted to stop all SCH members and delete the KOs. However, I worry about the sync issues you mentioned above. Thanks.

0 Karma

Path Finder

We did some modifications in all of our SHC members directly in metadata/local.meta. Afterwards we issued a rolling restart. There were no sync issues.

0 Karma

Engager

Thanks. I stopped splunk on each search head, removed my KOs, and restarted. It seems to have worked.

0 Karma

Ultra Champion

Phew. I bet it's gotten more stable over the years. Many can't afford to shut down all SH to do this but glad to hear it worked.

FWIW, there's new features for pushing config that could also be manipulated for clearing some local config by pushing deactivated or empty stanzas...I think. To be honest I haven't tried yet though.

Learn more in the docs at Choose a deployer push mode

0 Karma

Engager

Thanks for the link.

0 Karma