Deployment Architecture

Why are UF's are disappearing from ForwarderManagement

Gene
Path Finder

Dear Splunkers,

Can you please assist with following problem:

We have more 20 UF's installed on windows machines, all of them have deployment server set, and were visible in Forwarder Management. But in some time all of them disappeared from FM and are appearing from time to time there.

I have tried to delete $SPLUNK_HOME/etc/instance.cfg  on several forwarders and restarted them but problem was not fixed.

 

Any ideas how to fix it and what can cause such strange behavior?

 

Regards,

Eugene

Labels (2)
Tags (2)
0 Karma
1 Solution

Gene
Path Finder

Thank you all for help. The problem was in SSL keys. I don't know what happened and how did they connect for the first time, but after I have created new keys and published to forwarders - problem disappeared.

 

BTW: no error in logs regarding SSL/

View solution in original post

0 Karma

Gene
Path Finder

Thank you all for help. The problem was in SSL keys. I don't know what happened and how did they connect for the first time, but after I have created new keys and published to forwarders - problem disappeared.

 

BTW: no error in logs regarding SSL/

0 Karma

teunlaan
Contributor

Please check if your deployment server is not restarting/crashing.

The deployment server won't show any UF clients if it just restarted. Only after de UF clients called home it will pop-up.

Gene
Path Finder

Thx for the suggestion, but the strangest thing is that all forwarders are sending data as expected also to indexer, that is configured as DS. But not seen in console.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Polling from DC - DS use port 8089 and sending data is using 9997 (or 9998 TLS) by default.
0 Karma

Gene
Path Finder

Yes, correct. That is why I suspect that some firewall rules are blocking this connection, maybe some beckoning rules....

0 Karma

teunlaan
Contributor

So what is the internal log of the UF's telling?  Do they try connect but it fails, or don't they even try.

You state they connect 1 time and than it stops., what is strange.

Check if you arn't  pushing a deploymentclient.conf.

And is the config you're pushing restarting the Forwarder? if yes:  try changing your serverclass so it does not restart the UF, see if it keeps the conection (maybe something failes @ the restart) 

0 Karma

Gene
Path Finder

I have checked logs and still can't find what's wrong:

01-11-2022 14:20:18.073 +0200 INFO DC:HandshakeReplyHandler [13276 HttpClientPollingThread_85591AD8-9097-47F4-B73E-4F63150ACA4D] - Handshake done

01-11-2022 14:21:30.273 +0200 INFO DS_DC_Common [5620 MainThread] - Initializing the PubSub system.
01-11-2022 14:21:30.273 +0200 INFO DS_DC_Common [5620 MainThread] - Initializing core facilities of PubSub system.
01-11-2022 14:21:30.335 +0200 WARN HTTPAuthManager [5620 MainThread] - pass4SymmKey length is too short. See pass4SymmKey_minLength under the general stanza in server.conf.
01-11-2022 14:21:30.335 +0200 INFO HttpPubSubConnection [872 HttpClientPollingThread_85591AD8-9097-47F4-B73E-4F63150ACA4D] - Initial attempt to obtain connection will try after=37.475 seconds.
01-11-2022 14:21:30.335 +0200 INFO DC:DeploymentClient [5620 MainThread] - Starting phonehome thread.
01-11-2022 14:21:30.335 +0200 INFO DS_DC_Common [5620 MainThread] - Deployment Client initialized.
01-11-2022 14:21:30.335 +0200 INFO ServerRoles [5620 MainThread] - Declared role=deployment_client.
01-11-2022 14:21:30.335 +0200 INFO DS_DC_Common [5620 MainThread] - Deployment Server not available on a dedicated forwarder.
01-11-2022 14:21:30.335 +0200 INFO DC:PhonehomeThread [6308 PhonehomeThread] - Phonehome thread start, intervals: handshakeRetry=12.0 phonehome=60.0.
01-11-2022 14:21:30.335 +0200 INFO ClusteringMgr [5620 MainThread] - initing clustering with: ht=60.000 rf=3 sf=2 ct=60.000 st=60.000 rt=60.000 rct=5.000 rst=5.000 rrt=10.000 rmst=600.000 rmrt=600.000 icps=25 sfrt=600.000 pe=1 im=0 ip=0 mob=5 mor=5 mosr=5 pb=5 rep_port= pptr=10 pptrl=100 fznb=10 Empty/Default cluster pass4symmkey=false allow Empty/Default cluster pass4symmkey=true rrt=restart dft=180 abt=600 sbs=1
01-11-2022 14:21:30.335 +0200 INFO DC:DeploymentClient [6308 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected

01-11-2022 14:21:42.338 +0200 INFO DC:DeploymentClient [6308 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
01-11-2022 14:21:54.338 +0200 INFO DC:DeploymentClient [6308 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected

 

0 Karma

SinghK
Builder

Why are logs complaining about pass4symmkey??

0 Karma

Gene
Path Finder

Actually I don't know, have checked server.conf files both on DC and DS - and didn't find pass4symmkey there.

0 Karma

Gene
Path Finder

DS is not restarting or crashing, the thing is that clients connect only once and after they can't reach DS. I assume that problem is with Firewall rules but for now client is checking this.

 

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
If DC can connect to DS once after start, it's hard to think that the issue is in FW. Of course if you have L7/NG level FW then it's possible...
Can it be that your DC polling time is too long?

Gene
Path Finder

Actually all settings are default, we didn't touch polling time. I will try to play with that also, thx

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Are they disappeared permanently or only time by time? If last then you must remember that when you restart or do some other configuration changes on DS side, it needs that DCs (UF's) will phone home again before you can see those there again?

Have you MC in place and if are they there under Forwarders (enable forwarder management first).

r. Ismo

Gene
Path Finder

Hi, and thank you for response.

But actually situation is following:

we set up forwarders=>set DS=> they appear in console=> they disappear from console=> some of them sometimes appear, but not for a long time

I suppose that this can be connected to some firewall settings, but client assures that they can't find any connections that were blocked during that time. Also UF logs show:
INFO DC:DeploymentClient - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected

 

0 Karma

SinghK
Builder

Not necessarily if yu ou can get a copy of the uf logs after they have connected once to DS can shed more light on this, once there was an app that caused this too. So n number of possibilities. Logs can on tell what's happening..

isoutamo
SplunkTrust
SplunkTrust
And you have only one DS, not several behind LB?
0 Karma

Gene
Path Finder

Yes, correct.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
And your DS's splunk version is the highest or equal one than any other nodes UF+servers?

What "splunk btool deploymentclient list --debug" said? Are those values and places what you are expecting? And how about "splunk show deploy-poll" ?

I assume that telnet/curl from DC to DS:8089 (or what ever your mgmt port is) is working?

Those DCs are in same DC or behind VPN?

0 Karma

Gene
Path Finder

Versions are the same.

btool and show deploy-poll show correct values.

telnet -  clarifying with client, cause do not have access to endpoints where forwarders are installed. 

and clients are in the same subnet, no VPN is used.

 

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

You should try from UF side to DS curl/telnet. All traffic between those are initiated by DC not DS!

curl -vkI https://<Your DS fqdn>:8089

Above command show HEAD part of response with debug information.

For security reason it's good to disable 8089 (management) port on UF unless you are regularly using it from scripts etc. on UF side.

How about host based firewalls?

r. Ismo 

Get Updates on the Splunk Community!

Observability | How to Think About Instrumentation Overhead (White Paper)

Novice observability practitioners are often overly obsessed with performance. They might approach ...

Cloud Platform | Get Resiliency in the Cloud Event (Register Now!)

IDC Report: Enterprises Gain Higher Efficiency and Resiliency With Migration to Cloud  Today many enterprises ...

The Great Resilience Quest: 10th Leaderboard Update

The tenth leaderboard update (11.23-12.05) for The Great Resilience Quest is out &gt;&gt; As our brave ...