Deployment Architecture
Highlighted

Why am I seeing "DistributedPeerManagerHeartbeat - Unable to get server info from peer... due to connection reset" on my cluster master log?

I have seen a few other questions similar to this one, but not exactly, and the solutions do not work.

In my cluster master log, I am seeing the following error repeatedly:

  01-08-2016 23:37:42.853 +0000 WARN  DistributedPeerManagerHeartbeat - Unable to get server info from peer: http://<indexer ip>:8089 due to: Connection reset by peer

On the indexer, I see the following:

08-02-2014 18:11:42.033 -0700 WARN  HttpListener - Socket error from ,cmaster ip. while idling: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

The indexer is connecting to the master since I can see it in the master's indexer clustering peers and indexes tabs.

This appears to be an SSL issue, but I cannot figure out what. The indexer says it is connecting with http, but I would expect it to connect with https. But where is this set? The indexer is connecting to the master with the following server.conf stanza:

[clustering]
master_uri = https://<master ip>:8089
mode = slave
pass4SymmKey = <master password>

I verified that all passwords are correct.

0 Karma
Highlighted

Re: Why am I seeing "DistributedPeerManagerHeartbeat - Unable to get server info from peer... due to connection reset" on my cluster master log?

SplunkTrust
SplunkTrust

SSL configuration for indexer to master is same as other ssl configurations.

have a look at the SSL section of servers.conf

############################################################################
# SSL Configuration details
############################################################################

[sslConfig]
* Set SSL for communications on Splunk back-end under this stanza name.
  * NOTE: To set SSL (eg HTTPS) for Splunk Web and the browser, use
          web.conf.
* Follow this stanza name with any number of the following attribute/value
  pairs.
* If you do not specify an entry for each attribute, Splunk will use the
  default value.

enableSplunkdSSL = true|false
* Enables/disables SSL on the splunkd management port (8089) and KV store
  port (8191).
* Defaults to true.
* Note: Running splunkd without SSL is not generally recommended.
* Distributed search will often perform better with SSL enabled.

useClientSSLCompression = true|false
* Turns on HTTP client compression.
* Server-side compression is turned on by default; setting this on the
  client side enables compression between server and client.
* Enabling this potentially gives you much faster distributed searches
  across multiple Splunk instances.
* Defaults to true.

useSplunkdClientSSLCompression = true|false
* Controls whether SSL compression would be used when splunkd is acting as
  an HTTP client, usually during certificate exchange, bundle replication,
  remote calls etc.
* NOTE: this setting is effective if, and only if, useClientSSLCompression
        is set to true
* NOTE: splunkd is not involved in data transfer in distributed search, the
        search in a separate process is.
* Defaults to true.

sslVersions = <versions_list>
* Comma-separated list of SSL versions to support
* The versions available are "ssl2", "ssl3", "tls1.0", "tls1.1",
  and "tls1.2"
* The special version "*" selects all supported versions.  The version "tls"
  selects all versions tls1.0 or newer
* If a version is prefixed with "-" it is removed from the list
* When configured in FIPS mode ssl2 and ssl3 are always disabled regardless
  of this configuration
* Defaults to "*,-ssl2".  (anything newer than SSLv2)

supportSSLV3Only = true|false
* DEPRECATED.  SSLv2 is now always disabled by default.  The exact set of
  SSL versions allowed is now configurable via the "sslVersions" setting
  above

sslVerifyServerCert = true|false
* Used by distributed search: when making a search request to another
  server in the search cluster.
* Used by distributed deployment clients: when polling a deployment
  server.
* If this is set to true, you should make sure that the server that is
  being connected to is a valid one (authenticated).  Both the common
  name and the alternate name of the server are then checked for a
  match if they are specified in this configuration file.  A
  certificiate is considered verified if either is matched.
* Default is false.

sslCommonNameToCheck = <commonName>
* If this value is set, and 'sslVerifyServerCert' is set to true,
  splunkd will limit most outbound HTTPS connections to hosts which use
  a cert with this common name.
* 'sslCommonNameList' is a multivalue extension of this setting, certs
  which match 'sslCommonNameList' or 'sslCommonNameToCheck' will be
  accepted.
* The most important scenario is distributed search.
* This feature does not work with the deployment server and client
  communication over SSL.
* Optional.  Defaults to no common name checking.

sslCommonNameList = <commonName1>, <commonName2>, ...
* If this value is set, and 'sslVerifyServerCert' is set to true,
  splunkd will limit most outbound HTTPS connections to hosts which use
  a cert with one of the listed common names.
* The most important scenario is distributed search.
* Optional.  Defaults to no common name checking.

sslAltNameToCheck = <alternateName1>, <alternateName2>, ...
* If this value is set, and 'sslVerifyServerCert' is set to true,
  splunkd will also be willing to verify certificates which have a
  so-called "Subject Alternate Name" that matches any of the alternate
  names in this list.
  * Subject Alternate Names are effectively extended descriptive
    fields in SSL certs beyond the commonName.  A common practice for
    HTTPS certs is to use these values to store additional valid
    hostnames or domains where the cert should be considered valid.
* Accepts a comma-separated list of Subject Alternate Names to consider
  valid.
* Items in this list are never validated against the SSL Common Name.
* This feature does not work with the deployment server and client
  communication over SSL.
* Optional.  Defaults to no alternate name checking

requireClientCert = true|false
* Requires that any HTTPS client that connects to splunkd internal HTTPS
  server has a certificate that was signed by our CA (certificate
  authority).
* Used by distributed search: Splunk indexing instances must be
  authenticated to connect to another splunk indexing instance.
* Used by distributed deployment: the deployment server requires that
  deployment clients are authenticated before allowing them to poll for new
  configurations/applications.
* If true, a client can connect ONLY if a certificate created by our
  certificate authority was used on that client.
* Default is false.

cipherSuite = <cipher suite string>
* If set, Splunk uses the specified cipher string for the HTTP server.
* If not set, Splunk uses the default cipher string provided by OpenSSL.
  This is used to ensure that the server does not accept connections using
  weak encryption protocols.

ecdhCurveName = <string>
* ECDH curve to use for ECDH key negotiation
* We only support named curves specified by their SHORT name.
* The list of valid named curves by their short/long names can be obtained
  by executing this command:
  $SPLUNK_HOME/bin/splunk cmd openssl ecparam -list_curves
* Default is empty string.

sslKeysfile = <filename>
* Server certificate file.
* Certificates are auto-generated by splunkd upon starting Splunk.
* You may replace the default cert with your own PEM format file.
* Certs are stored in caPath (see below).
* Default is server.pem.

sslKeysfilePassword = <password>
* Server certificate password.
* Default is password.

caCertFile = <filename>
* Public key of the signing authority.
* Default is cacert.pem.

dhFile = <filename>
* PEM format Diffie-Hellman parameter file name.
* DH group size should be no less than 2048bits.
* This file is required in order to enable any Diffie-Hellman ciphers.

caPath = <path>
* Path where all these certs are stored.
* Default is $SPLUNK_HOME/etc/auth.

certCreateScript = <script name>
* Creation script for generating certs on startup of Splunk.

sendStrictTransportSecurityHeader = true|false
* If set to true, the REST interface will send a "Strict-Transport-Security"
  header with all responses to requests made over SSL.
* This can help avoid a client being tricked later by a Man-In-The-Middle
  attack to accept a non-SSL request.  However, this requires a commitment that
  no non-SSL web hosts will ever be run on this hostname on any port.  For
  example, if splunkweb is in default non-SSL mode this can break the
  ability of browser to connect to it.  Enable with caution.
* Defaults to false

allowSslCompression = true|false
* If set to true, the server will allow clients to negotiate
  SSL-layer data compression.
* Defaults to true.

allowSslRenegotiation = true|false
* In the SSL protocol, a client may request renegotiation of the connection
  settings from time to time.
* Setting this to false causes the server to reject all renegotiation
  attempts, breaking the connection.  This limits the amount of CPU a
  single TCP connection can use, but it can cause connectivity problems
  especially for long-lived connections.
* Defaults to true.

http://docs.splunk.com/Documentation/Splunk/6.2.0/Security/AboutsecuringyourSplunkconfigurationwithS...

0 Karma
Highlighted

Re: Why am I seeing "DistributedPeerManagerHeartbeat - Unable to get server info from peer... due to connection reset" on my cluster master log?

Yes, I have looked through that and tried a few things but nothing works. Why would I need to change anything from the default anyway? Which of those options do you think I need to change to fix this?

0 Karma
Highlighted

Re: Why am I seeing "DistributedPeerManagerHeartbeat - Unable to get server info from peer... due to connection reset" on my cluster master log?

SplunkTrust
SplunkTrust

By default you mean, are you using splunk default certificates? A sample server conf should look like this

[sslConfig]
caCertFile = cacert.crt
caPath = $SPLUNK_HOME/etc/auth <your cert directory>
sslKeysfile = splunk-srver.pem
sslKeysfilePassword = password of cert
sslVersions = tls, -tls1.0
sslVerifyServerCert = true
sslCommonNameList = This is optional but increases security
0 Karma
Highlighted

Re: Why am I seeing "DistributedPeerManagerHeartbeat - Unable to get server info from peer... due to connection reset" on my cluster master log?

Thanks for your continued help. I did not have every line in your example conf as I was using the splunk defaults, but I went ahead and added every sslConfig line and make sure it was correct. I also modifed the sslVersions to remove tls1.0 as you have in case that was it. But I still have the same issues. I cannot figure out what is wrong or what is different.

Here is my server.conf on the indexer. The one on the cluster master is similar but with the extra clustering info.

[sslConfig]
sslKeysfilePassword = ...
allowSslCompression = true
allowSslRenegotiation = true
caCertFile = cacert.pem
caPath = $SPLUNK_HOME/etc/auth
certCreateScript = $SPLUNK_HOME/bin/splunk, createssl, server-cert
cipherSuite = TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH
enableSplunkdSSL = true
sendStrictTransportSecurityHeader = false
sslKeysfile = server.pem
sslVersions = tls, -tls1.0
useClientSSLCompression = true
useSplunkdClientSSLCompression = true
sslVerifyServerCert = true

[general]
pass4SymmKey = ...
serverName =...
site = site1

[license]
master_uri = ...

[replication_port://9100]

[clustering]
master_uri = ...
mode = slave
pass4SymmKey = ...

Even with the extra ssl config params, I am still getting the following on the indexer:

01-11-2016 22:06:06.089 +0000 WARN  HttpListener - Socket error from  while idling: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

and the following on the cluster master:

01-11-2016 22:06:06.057 +0000 ERROR HttpClientRequest - HTTP client error: Read Timeout (while accessing https://:8089/services/server/info)
01-11-2016 22:06:06.121 +0000 ERROR HttpClientRequest - HTTP client error: Connection reset by peer (while accessing http://:8089/services/server/info)
01-11-2016 22:06:06.121 +0000 WARN  DistributedPeerManagerHeartbeat - Unable to get server info from peer: http://:8089 due to: Connection reset by peer
0 Karma
Highlighted

Re: Why am I seeing "DistributedPeerManagerHeartbeat - Unable to get server info from peer... due to connection reset" on my cluster master log?

SplunkTrust
SplunkTrust

The error above can be different reason including URL for server, certificate is wrong,certificate is missing and so on. In the cluster master log, the hostname is missing. Hope you have masked it. I would suggest to check certificates on both indexer and master and make sure that the certificates are correct and also run splunk in debug mode to see more information. Also troubleshoot with openssl s_client -connect commands. If not, I'm afraid you have to contact splunk support unless someone here is able to identify from error message itself.

0 Karma