As I install more apps into my search head cluster i'm finding more and more app specific quirks. Some of these can be worked around with little tricks (ie. dbx, latest SA-ldapsearch) while others are totally broken.
Is it possible to get a flag or some way on apps own pages so devs can indicate their shc status? The compatibility flag isn't quite enough.
Splunk Supporting Add-on for Active directory
1.These create LOCAL configurations per member when the gui is used.
Work around : Connect to each member and create the config OR create it once copy back to deployer and push out.
2.Passwords are kept internally within each splunk instance.
Work around : None. You need to connect to each box to get the passwords stored correctly.
Splunk App for Windows Infrastructure
Initial setup will run on EACH box regardless of if it is configured else where. I assume app.conf [install] is_configured = 1 isn't being honored for some reason on first start. Users will see this on first app run (bad) if you havn't manually connected to each node and check it after an install.
Work around (maybe) : set the is_configured on the deployer and see if that bypasses the setup run (I don't think it does).
Initial lookup build wizard needs to run on EACH box. I'm unsure if the scheduled jobs fix this after a while however. I did have some nodes that didn't have copies of the lookups a few hours after install but this could be a seperate issue.
Most of the issues are replication based due to how the app's function. The localisation of these files creates inconsistances across various search head cluster members. As such you need to be VERY VERY careful in what gets touched and by what methods.
1.When configurations are put in via the gui they don't replicate to other nodes. I've tried adding the end point so that databases.conf gets replicated around but it doesn't.
Work around : create the entry in the gui. Copy this back to the deployer and apply the bundle again so they all grab the full copy from the captain. Delete that one local one.
2.Password hashes are different per machine so you can't use the deployer to push a unified database.conf to members. Normally for search head pooling you use the dbx_shpinst.py method to set search head context specific hased passwords.
Work around : reinstall splunk fresh on every member utilising the same splunk.secret file. The same hashed non-member specific passwords can be used.
3.Internal app links point to manager pages. As these are locked off due to the replication protection button pages break. This isn't obvious to users.
Work around : unlock the page by clicking the settings button.
4.User created databases won't replicate.
Work around : none/make them system wide ones created by an admin. Education to users that this isn't supported I suppose.
Thanks for the update and I appreciate the information. I understand and I agree that there should be some sort of certification process or stamp of approval marker on an app that has been tested and validated to function properly on a SHC.
I have seen the problems with SA_LDAPSearch and I appreciate the explanation on the on the DBConnect. Do you have any details on what functionality does not work for Win-infra? If you have the work arounds.. great.. but at least knowing what issues you have found will mean that I and others will not have to stumble across them. 🙂
see the post below (I had to split the reply due to character limits).
As for certification etc, i'd just like it so I don't have to do it all myself. SHC is a hugely touted feature and it would great to know what apps work "out of the box"/"may require a little more finesse" or down right don't work.
Do you have a list of the apps that are not working with clustering? Even an answer to this question with a simple list of things known to work/not work would be helpful. We are deploying search head clustering now and haven't seen any obvious issues.
The apps I am working on right now are Splunk app for CEF, Splunk app for Stream and Splunk app for ThreatConnect. for starters.
Where are the conf files?
1). Many of the issues stem from the dev's building on standalone instances and forgetting that .conf's get consolidated into the default app directory to ensure user created objects in the local directory of the app are not over written on the next bundle push by the master. Some apps rely on index, props and transform .con'f that need to reside on the indexer/s, assuming you're implementing index clustering as well, or heavy forwarders if you are doing any pre-parsing of logs. This is the easiest of the issues to fix by replicating the conf files and using the deployment server to push them out. It may lead to a larger more complicated issue which i will get to a bit later in issue #3.
Where should the data/logs go?
2).You're outputs.conf on any search head in the cluster should have been set up to not index and to forward logs to your index cluster using the [tcpout:] stanza by default but the apps output.conf may or may not be overridden depending on the type of data in the app or if there is any associated output at all. In which case the inputs.conf in the app needs to be routed to the appropriate outputs stanza using the "_TCP_ROUTING = " arg in the stanza. Changing this input stanza may lead to further issues discussed in #3.
-Add-on for Active directory
-almost all of them
Python, that is a snake right?
3). This is where it gets fun. As seen in issue #1 the location of the conf files are different in a shcluster and many dev's will handle custom api's and scripted inputs using, what else, but their own custom python scripts. This can create a situation where a script is relying on a static location for a file #1 and or more specifically a static stanza with in the conf file #2. Unfortunately both location and stanza structure has changed breaking the script. This needs to be resolved by editing the custom scripts associated with the app.
Did I just write an App?
4). Yes you did! To further this issue I have seen scripts that call the OS python at start then depend on Splunk's own python lib's causing all kinds of dependency issues. Or visa versa and the proper imports are not called when using the Splunk specific Python. Splunk's best practice says to always use splunk's prepackaged python to run all scripts and import environmentals needed at run time. Splunk has ben pretty good on keeping python current and lib's robust. But not everyone does as these change depending on stand alone and clustered environments. Bottom line it takes a lot of debug cycles and python experience to fix these types of issues.
To follow up you didn't exactly write an app but sure deserve an authors mention if not co-author. I hope this explains to a few people the difficulties in making apps "SHCluster" complaint. The dev's work is never finished and I feel for them. At the same token I believe that most users of apps that are associated with other "paid for" appliances/applications and or are paid for apps them selves are Enterprise costumer focused and thus should be developed for a clustered environment from the get go.
With this understanding I believe it would be nice to have a tag on the apps for users to know what they are getting into when implementing.
Well i havn't tested every app as this would be impossible given im actually supposed to be doing work not testing other peoples apps 😉 but I've listed a couple in the OP.
The most obvious big ones my customers have reported and i've verified that there are issues with are :
Splunk app for Windows Infra (latest)
Splunk Supporting Add-on for Active directory.
Quite a few of these issues are to do with replication and localisation of files not being supported on clustering. Hence my original request for a tested/complaint flag ... hell, even a docs.splunk page listing the clustering limitations would save me HOURS of trying to figure out reproducing the issue, figuring out a work around, logging support tickets for supported apps.