Getting Data In
Highlighted

Why are we getting "SSL clause is not found or servercert not provided" in forwarder splunkd.log, causing data not to be sent?

Motivator

A forwarder just up and quit sending logs to my indexer one morning last week. I did not notice until Monday (yesterday) afternoon. I asked the admin there to restart splunkd and when that did not correct the issue, I asked him to send me the logs. I saw several things that don't look good. The first was the second of these two back-to-back forwarder splunkd log entries:

09-15-2014 16:30:31.998 -0700 WARN  DeploymentClient - Phonehome thread is now started.
09-15-2014 16:30:31.998 -0700 WARN  DeploymentClient - Unable to send handshake 

The second was in the part of the forwarder splunkd logs where you normally see the TCP ports being initialized:

09-15-2014 16:30:33.165 -0700 INFO  TcpInputConfig - SSL clause not found or servercert not provided - SSL ports will not be available

And then on the indexer I found this in the splunkd.log:

09-15-2014 15:12:40.669 -0700 ERROR TcpInputProc - Error encountered for connection from src=128.200.xxx.xxx:49266.error:140760FC:SSL routines:SSL23_GET_CLIENT_HELLO:unknown protocol
09-15-2014 15:18:17.960 -0700 ERROR TcpInputProc - Error encountered for connection from src=128.200.xxx.xxx:52373.error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

Any ideas why this has suddenly started happening?

0 Karma
Highlighted

Re: Why are we getting "SSL clause is not found or servercert not provided" in forwarder splunkd.log, causing data not to be sent?

Motivator

Perhaps the internal SSL certificate has expired. Off the top of my head that is what springs to mind. How long has the installation been in place?
See About securing Splunk with SSL

0 Karma
Highlighted

Re: Why are we getting "SSL clause is not found or servercert not provided" in forwarder splunkd.log, causing data not to be sent?

Motivator

Good idea, but I just updated the SSL cert on my indexer and all my forwarders last May. They all have the same server cert, so it does not explain this one forwarder. I just got the forwarder release from the admin that runs it:

#cat /opt/splunk/etc/splunk.version 
VERSION=5.0.3
BUILD=163460
PRODUCT=splunk
PLATFORM=SunOS-sparcv9

I also had him to a ' splunk cmd btool outputs list --debug' and all I see in the email he sent me back was lines from /opt/splunk/etc/system/default/outputs.conf and nothing from the deployment app outputs.conf at all. That is very bizarre.

0 Karma
Highlighted

Re: Why are we getting "SSL clause is not found or servercert not provided" in forwarder splunkd.log, causing data not to be sent?

Motivator

Well, it turns out that the /opt/splunk/etc/apps/OITOUTPUT9998 went missing for whatever reason, and that contains the server.conf and outputs.conf in the default directory. The splunk forwarder apparently crashed at some point and those files were not there for it to read. The question is, why did it go away? That's a mystery. This is spec'd-out in the global section of the serverclass.conf file, so every forwarder described there should have a copy of it. #mystery

View solution in original post

Highlighted

Re: Why are we getting "SSL clause is not found or servercert not provided" in forwarder splunkd.log, causing data not to be sent?

Motivator

The mystery deepens. There is no evidence of crash files, and no evidence of system crash or reboot. There were no configuration changes made to the deployment application for either of these systems, so nothing to make splunk forwarder restart. Yet it had to have in order to notice the missing server.conf and outputs.conf files which were in /opt/splunk/etc/apps/OITOUTPUT9998/default.

0 Karma
Highlighted

Re: Why are we getting "SSL clause is not found or servercert not provided" in forwarder splunkd.log, causing data not to be sent?

Motivator

More on this problem

@sowings proposes that the power cut to the Deployment Server and subsequent power restore acted like a "reload deploy-server" command with unanticipated edits to the serverclass.conf (or to the application bundles). It turns out that some twenty forwarders were affected by this, with various parts of their deployed configurations missing. Port 9997 is not defined on any of these forwarders; only 9998. But as that is defined in the deployed application, loss of the outputs.conf caused the forwarder to be unable to establish a connection with the deployment server/indexer. The solution was to delete whatever was left in /opt/splunk/var/run and /opt/splunk/etc/apps and then define the outputs in /opt/splunk/etc/system/local/outputs.conf

As soon as the Splunk forwarder was restarted it contacted the DS and got its bundle and everything was fine.