Deployment Architecture

distsearch.conf is overridden after updating through GUI , upon restarting splunk

nmohammed
Contributor

We've SH Cluster environment and are seeing the following error ;

"Gave up waiting for the captain to establish a common bundle version across all search peers; using most recent bundles on all peers instead"

After some re-search and looking through answers site, this could be due to inconsistent distsearch.conf on some of the search heads in the cluster ; so I updated and removed all the values to servers key in distsearch.conf on all the search heads in the cluster and restarted splunk; but immediately following restart the changes made are overridden and restored to old distsearch.conf file. We're not deploying this file with these changes using deployer.

Following was done (multiple times) on each search head in the cluster (IPs hashed for security purposes) -

  1. cat /opt/splunk/etc/system/local/distsearch.conf
    [distributedSearch]
    servers = https://10.xxx.36.000:8089,https://10.xxx.46.00:8089,https://10.xxx.46.00:8089,https://10.xxx.46.00:...

  2. Changed distsearch.conf to

[distributedSearch]
servers =

  1. Restarted splunk
  2. Checked the distsearch.conf file to find contents restored

We even tried to delete the distsearch.conf file across all the search heads in the cluster , followed by restarting all the members, but the distsearch.conf file gets recreated.

output of btool command on distsearch from one of the affected search heads in the cluster. I have checked for any monitoring/CM tool, but we don't have any to manage splunk process.

[spnksvc@ep3vmnspk199 bin]$ ./splunk cmd btool distsearch list --debug
/opt/splunk/etc/system/default/distsearch.conf [bundleEnforcerBlacklist]
/opt/splunk/etc/system/default/distsearch.conf [bundleEnforcerWhitelist]
/opt/splunk/etc/apps/splunk_dist_conf/default/distsearch.conf [distributedSearch]
/opt/splunk/etc/system/default/distsearch.conf authTokenConnectionTimeout = 5
/opt/splunk/etc/system/default/distsearch.conf authTokenReceiveTimeout = 10
/opt/splunk/etc/system/default/distsearch.conf authTokenSendTimeout = 10
/opt/splunk/etc/system/default/distsearch.conf bestEffortSearch = false
/opt/splunk/etc/system/default/distsearch.conf connectionTimeout = 10
/opt/splunk/etc/system/default/distsearch.conf defaultUriScheme = https
/opt/splunk/etc/apps/splunk_dist_conf/default/distsearch.conf disabled = 0
/opt/splunk/etc/system/default/distsearch.conf receiveTimeout = 600
/opt/splunk/etc/system/default/distsearch.conf sendTimeout = 30
/opt/splunk/etc/apps/splunk_dist_conf/default/distsearch.conf serverTimeout = 900
/opt/splunk/etc/system/local/distsearch.conf servers = https://10.xxx.36.000:8089,https://10.xxx.46.00:8089,https://10.xxx.46.00:8089,https://10.xxx.46.00:...
/opt/splunk/etc/system/default/distsearch.conf shareBundles = true
/opt/splunk/etc/apps/splunk_dist_conf/default/distsearch.conf statusTimeout = 900
/opt/splunk/etc/system/default/distsearch.conf useSHPBundleReplication = true
/opt/splunk/etc/apps/Splunk_TA_windows/default/distsearch.conf [replicationBlacklist]
/opt/splunk/etc/apps/splunk_app_windows_infrastructure/default/distsearch.conf MSAD_lookups = .../splunk_app_windows_infrastructure/lookups/(tHostInfo|tSessions).csv$
/opt/splunk/etc/system/default/distsearch.conf conf = (system|(apps/))/(default|local)/server.conf
/opt/splunk/etc/system/default/distsearch.conf framework = apps/framework/...
/opt/splunk/etc/system/default/distsearch.conf lookupindexfiles = (system|apps/
|users(/reserved)?//)/lookups/.(tmp$|index($|/...))
/opt/splunk/etc/apps/splunk_dist_conf/default/distsearch.conf noBinDir = (.../bin/
)
/opt/splunk/etc/apps/Splunk_TA_windows/default/distsearch.conf nontsyslogmappings = ...ntsyslog_mappings.csv
/opt/splunk/etc/system/default/distsearch.conf sampleapp = apps/sample_app/...
/opt/splunk/etc/system/default/distsearch.conf user_specific_meta = users(/_reserved)?///metadata/local.meta
/opt/splunk/etc/apps/splunk_dist_conf/default/distsearch.conf [replicationSettings]
/opt/splunk/etc/system/default/distsearch.conf allowDeltaUpload = true
/opt/splunk/etc/system/default/distsearch.conf allowSkipEncoding = true
/opt/splunk/etc/system/default/distsearch.conf allowStreamUpload = auto
/opt/splunk/etc/system/default/distsearch.conf concerningReplicatedFileSize = 500
/opt/splunk/etc/system/default/distsearch.conf connectionTimeout = 60
/opt/splunk/etc/system/default/distsearch.conf excludeReplicatedLookupSize = 0
/opt/splunk/etc/apps/splunk_dist_conf/default/distsearch.conf maxBundleSize = 14438892420
/opt/splunk/etc/system/default/distsearch.conf maxMemoryBundleSize = 10
/opt/splunk/etc/apps/splunk_dist_conf/default/distsearch.conf replicationThreads = 8
/opt/splunk/etc/system/default/distsearch.conf sanitizeMetaFiles = true
/opt/splunk/etc/system/default/distsearch.conf sendRcvTimeout = 60
/opt/splunk/etc/system/default/distsearch.conf [replicationSettings:refineConf]
/opt/splunk/etc/system/default/distsearch.conf replicate.app = true
/opt/splunk/etc/system/default/distsearch.conf replicate.authorize = true
/opt/splunk/etc/system/default/distsearch.conf replicate.collections = true
/opt/splunk/etc/system/default/distsearch.conf replicate.commands = true
/opt/splunk/etc/system/default/distsearch.conf replicate.eventtypes = true
/opt/splunk/etc/system/default/distsearch.conf replicate.fields = true
/opt/splunk/etc/system/default/distsearch.conf replicate.literals = true
/opt/splunk/etc/system/default/distsearch.conf replicate.lookups = true
/opt/splunk/etc/system/default/distsearch.conf replicate.multikv = true
/opt/splunk/etc/system/default/distsearch.conf replicate.props = true
/opt/splunk/etc/system/default/distsearch.conf replicate.segmenters = true
/opt/splunk/etc/system/default/distsearch.conf replicate.tags = true
/opt/splunk/etc/system/default/distsearch.conf replicate.transactiontypes = true
/opt/splunk/etc/system/default/distsearch.conf replicate.transforms = true
/opt/splunk/etc/system/default/distsearch.conf [replicationWhitelist]
/opt/splunk/etc/system/default/distsearch.conf kvstore = kvstore
/...
/opt/splunk/etc/system/default/distsearch.conf other = (system|(apps/(?!pdfserver)
)|users(/_reserved)?//)/(bin|lookups)/...
/opt/splunk/etc/system/default/distsearch.conf refine.conf = (system|(apps/)|users(/_reserved)?//)/(default|local)/.conf
/opt/splunk/etc/system/default/distsearch.conf refine.metadata = (system|(apps/)|users(/_reserved)?//)/metadata/.meta
/opt/splunk/etc/system/default/distsearch.conf searchscripts = searchscripts/...
/opt/splunk/etc/system/default/distsearch.conf [tokenExchKeys]
/opt/splunk/etc/system/default/distsearch.conf certDir = $SPLUNK_HOME/etc/auth/distServerKeys
/opt/splunk/etc/system/default/distsearch.conf genKeyScript = $SPLUNK_HOME/bin/splunk, createssl, audit-keys
/opt/splunk/etc/system/default/distsearch.conf privateKey = private.pem
/opt/splunk/etc/system/default/distsearch.conf publicKey = trusted.pem

0 Karma

burwell
SplunkTrust
SplunkTrust

Hi so this couldn't be some automation like chef putting the file back for you?

0 Karma

nmohammed
Contributor

hi @burwell
We don't have any automation or CM tools monitoring file systems that would restore the file.

And the file is created by user that runs splunk on the server. We tried to delete the file and restart splunk, but it gets restored again.

0 Karma

jawaharas
Motivator

Try below approach.

  1. Modify config file in captain
  2. Run below command in Search-head members $SPLUNK_HOME/bin/splunk resync shcluster-replicated-config
  3. Then, run below command in captain to restart all Search-head members (including captain) $SPLUNK_HOME/bin/splunk rolling-restart shcluster-members
0 Karma

nmohammed
Contributor

Thanks @jawaharas

I don't see the file on on the captain now . Should I create a file with contents on captain and then run step 2 and 3 ?

0 Karma

jawaharas
Motivator

Yep. Go ahead.

0 Karma

nmohammed
Contributor

just tried the approach .

  1. created a distsearch.conf file with following contents on the captain -

[distributedSearch]
servers =

  1. Ran $SPLUNK_HOME/bin/splunk resync shcluster-replicated-config

  2. Rolling restart of SH members

I checked couple of members where the restart was completed and found the distsearch.conf file got overridden again to old with contents.

[distributedSearch]
servers = https://10.xxx.36.000:8089,https://10.xxx.46.00:8089,https://10.xxx.46.00:8089,https://10.xxx.46.00:...

Update -

Found set of old search heads (including the captain) in the cluster got updated with the old distsearch.conf (overridden); we added 4 new search heads this week and they seem to be okay.

0 Karma

jawaharas
Motivator

Did you run below command in search-head members (not in captain) and verify the config file content before restart?

$SPLUNK_HOME/bin/splunk resync shcluster-replicated-config

0 Karma

nmohammed
Contributor

yes. Ran it across all SH members, except for captain , then verified the config file contents on all the members before restart; but still seeing the issue .

0 Karma

jawaharas
Motivator

I hope you are using clustered indexers.

Can you check whether the shclustering stanza '$SPLUNK_HOME/etc/system/local/server.conf' file is consistent across all search-head members?

Also, can you share 'shclustering' stanza content from your search-head's 'server.conf' (after masking sensitive data)?

0 Karma

nmohammed
Contributor

hi @jawaharas

Yes, we're using index clustering. I tried to delete the distsearch.conf again today and restarted splunk on the search heads and found it was re-created on all except one search head in the cluster.

[sslConfig]
sslKeysfilePassword = $1$EDkhKG6tJRyF
sslPassword = $1$EDkhKG6tJRyF

[lmpool:auto_generated_pool_download-trial]
description = auto_generated_pool_download-trial
quota = MAX
slaves = *
stack_id = download-trial

[lmpool:auto_generated_pool_forwarder]
description = auto_generated_pool_forwarder
quota = MAX
slaves = *
stack_id = forwarder

[lmpool:auto_generated_pool_free]
description = auto_generated_pool_free
quota = MAX
slaves = *
stack_id = free

[general]
pass4SymmKey = $1$EXktLS6/MxP38oI=
serverName = eo1vmsk099.lema

[license]
master_uri = https://eo1vmsk444.lema:8089

[replication_port://8090]

[raft_statemachine]
disabled = false

[shclustering]
conf_deploy_fetch_url = https://eo1vmsk555.lema:8089
disabled = 0
mgmt_uri = https://10.XXX.XX.XXX:8089
id = 013107EC-FC15-4338-A045-75942E648CB7

[clustering]
master_uri = clustermaster:eo1vmsk555.lema:8089
mode = searchhead

[clustermaster:eo1vmsk555.lema:8089]
master_uri = https://eo1vmsk555.lema:8089
multisite = 0
site = default
pass4SymmKey = $1$EXktLS6/MxP38oI=

0 Karma

jawaharas
Motivator

The search head members fetches the configuration bundle from deployer (the host mentioned in 'conf_deploy_fetch_url' parameter).

Do you have connectivity between the search head (where you have issue) and the deployer (https://eo1vmsk555.lema:8089)?

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

The best practice is going to be editing this either from the GUI, or to create a app on the deployer and push this to the SHC. Editing config files does not trigger a replication task across the SHC, so when you edit this or delete off one host, the members are not aware of it and it can cause problems.

0 Karma

burwell
SplunkTrust
SplunkTrust

Hi. What version of Splunk is this happening on?

0 Karma

nmohammed
Contributor

@burwell - it's 7.1.1

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.