All Apps and Add-ons

Collectd throwing ssl errors from one server, but not others

Explorer

I have installed and configured collectd on 3 servers so far, and on 2 of them it works fine. These servers show up in the Splunk App for Infrastructure correctly; I can view all metrics as expected. The third server is throwing two errors consistently, as shown below.

Oct 16 10:37:54 <hostname> collectd[13731]: plugin_read_thread: read-function of the `flush/write_splunk' plugin took 90.018 seconds, which is above its read interval (90.000 seconds). You might want to adjust the `Interval' or `ReadThreads' settings.
Oct 16 10:38:54 <hostname> collectd[13731]: write splunk plugin: curl_easy_perform failed to connect to <HEC VIP>:8088 with status 35: SSL connect error

The collectd.conf is literally copy-pasted between all three servers, and only the Hostname option has been updated.

The HEC VIP is a load balancer pointing to all 4 indexers in our indexer cluster, if that makes a difference.

0 Karma

Splunk Employee
Splunk Employee

Check the box for Enable SSL in HEC Global settings.

alt text

0 Karma

Engager

I got here because I was getting errors from my one and only server, using collectd and Splunk App for Infrastructure.

The global SSL needs to be checked. If not, you will get the SSL error mentioned. Then, on the host running collectd, you need to ensule that the write_splunk stanza says "ssl true" in /etc/collectd.conf. If it says false, you'll see the following error:

write splunk plugin: curl_easy_perform failed to connect to your_indexer_host:8088 with status 56: Failure when receiving data from the peer

So, at this day and age (Splunk 7.3.3 with Universal Forwarder 8.0.1 on my client, and Splunk App for Infrastructure 2.1.0, ensure that the /etc/collectd.conf is correct, and that Globally, you have SSL checked (go to Settings-> Data, and hit the Global Settings box in the upper right hand corner)..

0 Karma

Splunk Employee
Splunk Employee

Are you sending data to HEC over HTTP with SSL enabled in collectd.conf? If you are, disable SSL in collectd.conf. Set ssl to false in the write_splunk plug-in.

0 Karma

Splunk Employee
Splunk Employee
  1. Can you provide Linux distro?
  2. Also can you try this from the non working server? Check if you get "Success". https://docs.splunk.com/Documentation/Splunk/7.3.1/Metrics/GetMetricsInOther Use curl to get some fake data in. Update server, port & token:

curl -k https://localhost:8088/services/collector -H "Authorization: Splunk b0221cd8-c4b4-465a-9a3c-273e3a75aa29" -d '{"time": 1486683865.000,"event":"metric","source":"disk","host":"host99","fields":{"region":"us-west-1","datacenter":"us-west-1a","rack":"63","os":"Ubuntu16.10","arch":"x64","team":"LON","service":"6","serviceversion":"0","serviceenvironment":"test","path":"/dev/sda1","fstype":"ext3","value":1099511627776,"metric_name":"total"}}'

0 Karma

Explorer

CentOS Linux release 7.4.1708 (Core)

I would need to replace the localhost with the HEC vip and the b0221cd8-c4b4-465a-9a3c-273e3a75aa291 with my HEC token, no?

0 Karma

Splunk Employee
Splunk Employee

yes... update them..

0 Karma

Explorer

I ran the curl command you posted (add -v) and got the below error:

* About to connect() to <HEC VIP> port 8088 (#0)
*   Trying <HEC VIP>...
* Connected to <HEC VIP> (<HEC VIP>) port 8088 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* NSS error -5938 (PR_END_OF_FILE_ERROR)
* Encountered end of file
* Closing connection 0
curl: (35) Encountered end of file

This took 1m0.013s to return.

0 Karma

Splunk Employee
Splunk Employee

It looks you need to update curl or nss?
Update and try same thing again?

0 Karma

Splunk Employee
Splunk Employee

This curl should give you "Success" from the servers that are working for you. Try it out..
I think you need to fix this nss error for collectd to work for you..

0 Karma

Explorer

I'm working with my sys admin team to get the nss issue sorted, will report back once that is done and another curl is run.

0 Karma