Deployment Architecture

Forwarder Management not listing all clients

guimilare
Communicator

Hello splunkers.

We have about 50 clients registered at our Forwarder Management.
However, since a couple days ago, only 18 of them are listed in the Forwarder Management.

The command $SPLUNK_HOME/bin/splunk list deploy-clients return the same 18 clients.

However, I can see data from the other 32 forwarders beeing indexed at Splunk.

As a test, I ran the following command $SPLUNK_HOME/bin/splunk set deploy-poll IP:8089 -auth user:passwd in one of the clients that is not listed in Forwarder Management, received the the Configuration updated message and restarted it.
However, the client is still not listed on Forwarder Management, and I need to push new apps to it.

Any ideas?

Regards,
GMA

0 Karma
1 Solution

guimilare
Communicator

Fixed it.
The UF certificate was expired.

When upgrading the UF I received the following messages:

_It seems that the Splunk default certificates are being used. If certificate validation is turned on using the default certificates (not-recommended), this may result in loss of communication in mixed-version Splunk environments after upgrade.

"/opt/splunkforwarder/etc/auth/ca.pem": certificate renewed
"/opt/splunkforwarder/etc/auth/cacert.pem": certificate renewed
"/opt/splunkforwarder/etc/auth/server.pem": certificate renewed_

After that, the clients were listed again in the Forwarder Management.

View solution in original post

markbarber21
Path Finder

I had the same problem, and I discovered that the same GUID was being sent by multiple Deployment clients. This is because we use AWS AMI's and the ID file is part of the common configuration.

I had to update our install scripts to remove the /opt/splunkforwarder/etc/instance.cfg file. When Splunk starts up, it is recreated automatically.

See also: https://answers.splunk.com/answers/542872/what-do-i-look-at-in-splunkdlog-to-troubleshoot-de.html

guimilare
Communicator

Fixed it.
The UF certificate was expired.

When upgrading the UF I received the following messages:

_It seems that the Splunk default certificates are being used. If certificate validation is turned on using the default certificates (not-recommended), this may result in loss of communication in mixed-version Splunk environments after upgrade.

"/opt/splunkforwarder/etc/auth/ca.pem": certificate renewed
"/opt/splunkforwarder/etc/auth/cacert.pem": certificate renewed
"/opt/splunkforwarder/etc/auth/server.pem": certificate renewed_

After that, the clients were listed again in the Forwarder Management.

mattymo
Splunk Employee
Splunk Employee

Hi guimilare!

The UF logs should steer you to the reason they are not contacting the DS.

First, double check your ability to reach the DS from those forwarders using telnet on port 8089.

Then you can check from the UF itself by navigating to $SPLUNK_HOME/var/log/splunk and running tail -f or tail -f splunkd.log | grep HttpPubSubConnection

Here is a working UF calling DS:

06-29-2017 14:27:44.875 +0000 INFO HttpPubSubConnection - Running phone...

or from the search gui, if you are receiving _internal logs from these impacted UFs: index=_internal HttpPubSubConnection

Then you can check from the Deployment Server perspective in Splunk index=_internal source=*splunkd.log pubsubsvr OR deploymentserver or again from the splunkd.log

The finally, btool is your friend! Double check your UF configs

./splunk btool deploymentclient list --debug

- MattyMo
0 Karma

guimilare
Communicator

The hosts that are not listed in Forwarder Management, the result I get from the search index=_internal HttpPubSubConnection is:

06-29-2017 18:01:00.075 +0000 WARN HttpPubSubConnection - Unable to parse message from PubSubSvr: 06-29-2017
18:01:00.075 +0000 INFO > HttpPubSubConnection - Could not obtain connection, will retry after=79 seconds.

0 Karma

mattymo
Splunk Employee
Splunk Employee

hmm, can you telnet to 8089?

- MattyMo
0 Karma

guimilare
Communicator

Yes, I can telnet from UF to DS on port 8089

0 Karma

mattymo
Splunk Employee
Splunk Employee

Can you check btool output on the UF?

./splunk btool deploymentclient list --debug

Need to make sure the phone home URI is correct.

- MattyMo
0 Karma

guimilare
Communicator

This is the result:

$ splunk btool deploymentclient list --debug
/opt/splunkforwarder/etc/system/local/deploymentclient.conf [target-broker:deploymentServer]
/opt/splunkforwarder/etc/system/local/deploymentclient.conf targetUri = 10.217.XX.XXX:8089

The IP is correct, this is the DS IP.

0 Karma

mattymo
Splunk Employee
Splunk Employee

Mine looks like this, fwiw:

[splunker@n00b-splkufw-01 bin]$ ./splunk btool deploymentclient list --debug
/opt/splunkforwarder/etc/apps/n00blab_all_forwarder_deploymentclient/local/deploymentclient.conf [deployment-client]
/opt/splunkforwarder/etc/apps/n00blab_all_forwarder_deploymentclient/local/deploymentclient.conf [target-broker:deploymentServer]
/opt/splunkforwarder/etc/apps/n00blab_all_forwarder_deploymentclient/local/deploymentclient.conf targetUri = 10.10.x.x:8089

not sure if you just omitted the [deployment-client] stanza in ur paste.

Can we compare the output to one of the UF that is properly calling the DS?

Also, what does the pubsvr or DeploymentServer internal logs show you ?

- MattyMo
0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...