Deployment Architecture

What is the best way to remove local config in a search head cluster?

sloshburch
Splunk Employee
Splunk Employee

Scenario:
Anyone using a SHC (Search Head Cluster) implements apps from the Deployer. The deployer collapses the local and default config directories into default and pushes the config to the SHC members.

After normal usage, some of the knowledge objects in the app have evolved (like a savedsearch or a macro has been modified).

Eventually a new version of the app comes out and it has a butt kickin' nice new version of that very knowledge object. So, I stage the new version of the app on the deployer and push it out.

Unfortunately, the local folder edit of the knowledge object still takes precedent and the sweet new version (sitting in the default directory on the SHC members) is ignored.

How do we eliminate our version of the config and revert back to the one in the local directory?

Since we can't delete the edited version of the knowledge object from the UI, and we can't manually edit the conf file, what is the recommended way to address this?

More detail:
If you use the Deployer to send the splunk_app_aws to your Search Head Cluster you'll then have a bunch of cool knowledge objects that you can edit. Let's pretend I want to edit aws-accesslog-sourcetype(1) to adjust it for my environment. Before the edit, this config lives ONLY in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf on the Search Head Cluster Members. I make the my change in the UI and the result is this:
alt text
Notice my new definition of blah but no way to delete or revert back. There is now a corresponding version of this macro on the Search Head Cluster Members in $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf defined as blah.

Now let's pretend that after some time, I want to remove my change and go back to the version provided in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf - with a single search head OR a search head pool, I can simply remove the corresponding stanza in $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf with a text editor and restart the instance thereby allowing the version in $SPLUNK_HOME/etc/apps/splunk_app_aws/default/macros.conf to take affect.

Unfortunately, you cannot make manual edits to configuration in a Search Head Cluster. So is there a parallel way to remove your $SPLUNK_HOME/etc/apps/splunk_app_aws/local/macros.conf version in a Search Head Cluster?

Labels (1)

NullZero
Path Finder

This thread was helpful and particularly thanks to @jkat54. I had a similar problem at a client (we're both Splunk PS) and we could not remove completely a legacy app from the SHC, order of steps and the resolution:

  • Removed the App from the Deployer (app remained on the SHC despite bundle push)
  • Removed a legacy 'local' folder from the 3x SHC members (via CLI) debug refresh, remained, RR remained. /opt/splunk/etc/apps/<legacy_app_name>/local
  • Removed 'default.oldxxxxx' folder from the 3x SHC members (via CLI) debug refresh, remained, RR remained. /opt/splunk/etc/apps/<legacy_app_name>/default.oldxxxxx
  • The App still showed up in the menu albeit no 'static' icons; turns out that users had local KO's in path: '/opt/splunk/etc/users/<user>/<legacy_app_name>'
  • Removed these KO's and the <legacy_app_name> folder via CLI for the users in question.
  • Debug refresh, remained, RR fixed! all gone.

jdsl
Loves-to-Learn

A further step which may be useful: I encountered the same issue as this but the above steps were NOT successful in removing the associated app. The exact reason is still unclear, the only hints I have found were an event log generated during the bundle push, which you can find with the following search: 

index=_internal sourcetype=splunkd_conf component=confdeployment "data.task"=downloadDeployableApps

The event is a JSON object containing an element for each app; built in apps that do not get pushed by the deployer will have a value data.apps{}.action=preserved, ones which are pushed will have data.apps{}.action=matched.

In this case, the app showed as preserved, despite not being one of the built in defaults. Cluster members would not delete it even though all local configs were gone, and the app had been removed from the deployer's bundle, and neither debug/refresh nor rolling restart helped.

The eventual solution was:

  • create a dummy app with the same name - literally an empty directory - on the deployer
  • push the bundle with this directory
  • this made the SHC members move the existing configs into a default.old.<date>-<time> directory
  • Repeated the above post's steps for deleting this dir and local configs, just to be sure
  • Pushed bundle again - app successfully deleted
0 Karma

jkat54
SplunkTrust
SplunkTrust

Glad I could help, what was "RR" though?

0 Karma

NullZero
Path Finder

RR = Rolling Restart of the SHC.

We took the approach of:

debug refresh > Verify > Rolling Restart.

Obviously it is less desirable to carry out a RR so we tried to avoid and ensured it was searchable throughout. Obviously if faced with the problem again we would have deleted the local, default.oldxxxx, <user> local files and only then undertaken a debug refresh, verify, then RR.

0 Karma

anwarmian
Communicator

You can use deployer_push_mode = full | merge_to_default | local_only | default_only

You have add it under [shclustering] stanza in app.conf of the app you are deploying.  I think you have to have Splunk 7.3.0 and higher.
Here's the doc.
https://docs.splunk.com/Documentation/Splunk/8.0.1/DistSearch/PropagateSHCconfigurationchanges

0 Karma

woodcock
Esteemed Legend

What we do is disallow (by policy mostly) editing the production version of the apps. Users can work in personal space but not promote anything into the production apps on the production Search Head. All development is done on a dev search head and we have a packaging script that does stuff like set the version number in app.conf to YYYY.MM.DD, enable scheduled searches that are disabled but have a "ready for production" string in the description section, removes backups from the Lookup File Editor, etc. So we dev in dev, and extract from dev, productionize with a script, then push out from the deployer.

sloshburch
Splunk Employee
Splunk Employee

Hmm. That sounds more about how to handle source control. Any nuance to that approach for how to remove SHC config? Like the private stuff?

woodcock
Esteemed Legend

We don't care about private stuff. If you bork your Splunk, that's your problem. If you bork SOMEBODY ELSE'S Splunk, then we have a problem.

sloshburch
Splunk Employee
Splunk Employee

he he.....

0 Karma

jkat54
SplunkTrust
SplunkTrust

Since the KO sync happens after you press save in the UI, you can delete the KO manually at any time from each search head and hit debug/refresh endpoint on each SH after the change.

I’ve done this many times without issue.

jmackie
Engager

The debug/refresh trick doesn't seem to work for <app>/local/savedsearch.conf deletions.

At least not for me on 8.1.1.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

I assume speed is important here. If you are slow to modify each SHCmember I would expect sync issues to percolate. This is only important here because the SHs are running and talking to the captain and each other.
If the instances are offline that won't be of concern.
But yes, this approach makes sense as working so thanks for sharing!

0 Karma

jkat54
SplunkTrust
SplunkTrust

No, timing doesnt really matter.

You're deleting the local copy, which will then still be referenced in memory and doesnt trigger a replication to occur because the file is missing on the filesystem not in memory.

So then, if you delete the local copy on all your search heads, and then proceed to hit debug/refresh, no replication occurs. You havent made any edits or created new config files to cause a replication event to be triggered.

If you need to see for yourself, read it here:

https://docs.splunk.com/Documentation/Splunk/8.0.0/DistSearch/HowconfrepoworksinSHC

"The cluster does not replicate any configuration changes that you make manually, such as direct edits to configuration files."

ndgold
Explorer

Yes. I was able to do this quickly, as we have only three search heads at this time. As we grow, this will be more difficult.

ndgold
Explorer

Thanks. As I posted above, I ended up stopping Splunk and manually deleting. I'll try debug/refresh next time.

koshyk
Super Champion

This is always a problem with Splunk whereby from Deployer it pushes to "default" in SH members. What we do is
- Strict Control for End Users by process and roles: They cannot create dashboards/config elements on Apps. But only on personal space. If anything to be deployed to App, it should come to Splunk Platform team and will be deployed via deployer
- In case if you have a local entry, the splunk engineer should merge this after carefully looking into source-control + deployer. Then sh-deploy and wait for it to finish. Then STOP all SH members , delete it from all at same time from local, clear raft. Restart and redeploy again from deployer to make it 100% sure.

But if you have this problem already, the only reliable workaround I've found is
- STOP all the search heads at the approximate time.
- Take backup of files in "local" directory. i.e. /opt/splunk/etc/apps//local/.conf
- Ensure all SEarch heads are stopped .( Yes, I know this is impacting.. but hey.. its SHC pain)
- Remove the local file itself from all search heads. Dont edit it.
- Start search heads and do a redeploy from deployer

jkat54
SplunkTrust
SplunkTrust

I dont want to downvote this but then i do. There's no need to stop the entire SHC to perform this maintenance. Instead, please see my answer here on this same question.

0 Karma

koshyk
Super Champion

please note, this was written in 2017 when it was Splunk 6.2.x. At that time, if you fix on one server at a time, by the time you restart the other, the replicated settings used to come and reset the work you have done. Hence had to stop all
I'm not sure how it is with Splunk 7.x or 8.x

0 Karma

jkat54
SplunkTrust
SplunkTrust

You don't have to stop them all in any version of splunk with SHC.

I've used my method since SHC was GA.

I didn't downvote but your answer causes undue harm if anyone follows it. That's like biggest criteria for a downvote but you're koshyk and I'd rather give you the opportunity to remove or modify.

goodsellt
Contributor

The ways I can think of would be to either:
1. Stop all SHC members, remove the local folder for the app, start them up again.
2. Move the app off the SHC deployer, let the SHC delete the app from themselves, redeploy.
3. I've never tried this, but maybe it could work: instead of using the SHC deployer, deploy everything to the SHC captain from a regular deployment server or manually and see if it'll replicate everything across the rest of the members anyways.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...