Deployment Architecture

Why are UF's are disappearing from ForwarderManagement

Gene
Path Finder

Dear Splunkers,

Can you please assist with following problem:

We have more 20 UF's installed on windows machines, all of them have deployment server set, and were visible in Forwarder Management. But in some time all of them disappeared from FM and are appearing from time to time there.

I have tried to delete $SPLUNK_HOME/etc/instance.cfg  on several forwarders and restarted them but problem was not fixed.

 

Any ideas how to fix it and what can cause such strange behavior?

 

Regards,

Eugene

Labels (2)
Tags (2)
0 Karma
1 Solution

Gene
Path Finder

Thank you all for help. The problem was in SSL keys. I don't know what happened and how did they connect for the first time, but after I have created new keys and published to forwarders - problem disappeared.

 

BTW: no error in logs regarding SSL/

View solution in original post

0 Karma

Gene
Path Finder

Thank you all for help. The problem was in SSL keys. I don't know what happened and how did they connect for the first time, but after I have created new keys and published to forwarders - problem disappeared.

 

BTW: no error in logs regarding SSL/

0 Karma

mlm
Explorer

Hi Gene,

I am facing the similar issue. Do you mind sharing what you exactly did to resolve this?

Thank you! 

0 Karma

Gene
Path Finder

As  I mentioned - problem was that we have an application on UF's and indexers for SSL log encryption. The problem was that  someone put in config file wrong password for .*pem file and because of that forwarders started to disappear from the console, as inactive. First thing what you should check - review both indexer and forwarder logs for any connection problems.

 

Best,

Eugene

0 Karma

teunlaan
Contributor

Please check if your deployment server is not restarting/crashing.

The deployment server won't show any UF clients if it just restarted. Only after de UF clients called home it will pop-up.

Gene
Path Finder

Thx for the suggestion, but the strangest thing is that all forwarders are sending data as expected also to indexer, that is configured as DS. But not seen in console.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Polling from DC - DS use port 8089 and sending data is using 9997 (or 9998 TLS) by default.
0 Karma

Gene
Path Finder

Yes, correct. That is why I suspect that some firewall rules are blocking this connection, maybe some beckoning rules....

0 Karma

teunlaan
Contributor

So what is the internal log of the UF's telling?  Do they try connect but it fails, or don't they even try.

You state they connect 1 time and than it stops., what is strange.

Check if you arn't  pushing a deploymentclient.conf.

And is the config you're pushing restarting the Forwarder? if yes:  try changing your serverclass so it does not restart the UF, see if it keeps the conection (maybe something failes @ the restart) 

0 Karma

Gene
Path Finder

I have checked logs and still can't find what's wrong:

01-11-2022 14:20:18.073 +0200 INFO DC:HandshakeReplyHandler [13276 HttpClientPollingThread_85591AD8-9097-47F4-B73E-4F63150ACA4D] - Handshake done

01-11-2022 14:21:30.273 +0200 INFO DS_DC_Common [5620 MainThread] - Initializing the PubSub system.
01-11-2022 14:21:30.273 +0200 INFO DS_DC_Common [5620 MainThread] - Initializing core facilities of PubSub system.
01-11-2022 14:21:30.335 +0200 WARN HTTPAuthManager [5620 MainThread] - pass4SymmKey length is too short. See pass4SymmKey_minLength under the general stanza in server.conf.
01-11-2022 14:21:30.335 +0200 INFO HttpPubSubConnection [872 HttpClientPollingThread_85591AD8-9097-47F4-B73E-4F63150ACA4D] - Initial attempt to obtain connection will try after=37.475 seconds.
01-11-2022 14:21:30.335 +0200 INFO DC:DeploymentClient [5620 MainThread] - Starting phonehome thread.
01-11-2022 14:21:30.335 +0200 INFO DS_DC_Common [5620 MainThread] - Deployment Client initialized.
01-11-2022 14:21:30.335 +0200 INFO ServerRoles [5620 MainThread] - Declared role=deployment_client.
01-11-2022 14:21:30.335 +0200 INFO DS_DC_Common [5620 MainThread] - Deployment Server not available on a dedicated forwarder.
01-11-2022 14:21:30.335 +0200 INFO DC:PhonehomeThread [6308 PhonehomeThread] - Phonehome thread start, intervals: handshakeRetry=12.0 phonehome=60.0.
01-11-2022 14:21:30.335 +0200 INFO ClusteringMgr [5620 MainThread] - initing clustering with: ht=60.000 rf=3 sf=2 ct=60.000 st=60.000 rt=60.000 rct=5.000 rst=5.000 rrt=10.000 rmst=600.000 rmrt=600.000 icps=25 sfrt=600.000 pe=1 im=0 ip=0 mob=5 mor=5 mosr=5 pb=5 rep_port= pptr=10 pptrl=100 fznb=10 Empty/Default cluster pass4symmkey=false allow Empty/Default cluster pass4symmkey=true rrt=restart dft=180 abt=600 sbs=1
01-11-2022 14:21:30.335 +0200 INFO DC:DeploymentClient [6308 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected

01-11-2022 14:21:42.338 +0200 INFO DC:DeploymentClient [6308 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
01-11-2022 14:21:54.338 +0200 INFO DC:DeploymentClient [6308 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected

 

0 Karma

SinghK
Builder

Why are logs complaining about pass4symmkey??

0 Karma

Gene
Path Finder

Actually I don't know, have checked server.conf files both on DC and DS - and didn't find pass4symmkey there.

0 Karma

Gene
Path Finder

DS is not restarting or crashing, the thing is that clients connect only once and after they can't reach DS. I assume that problem is with Firewall rules but for now client is checking this.

 

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
If DC can connect to DS once after start, it's hard to think that the issue is in FW. Of course if you have L7/NG level FW then it's possible...
Can it be that your DC polling time is too long?

Gene
Path Finder

Actually all settings are default, we didn't touch polling time. I will try to play with that also, thx

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Are they disappeared permanently or only time by time? If last then you must remember that when you restart or do some other configuration changes on DS side, it needs that DCs (UF's) will phone home again before you can see those there again?

Have you MC in place and if are they there under Forwarders (enable forwarder management first).

r. Ismo

Gene
Path Finder

Hi, and thank you for response.

But actually situation is following:

we set up forwarders=>set DS=> they appear in console=> they disappear from console=> some of them sometimes appear, but not for a long time

I suppose that this can be connected to some firewall settings, but client assures that they can't find any connections that were blocked during that time. Also UF logs show:
INFO DC:DeploymentClient - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected

 

0 Karma

SinghK
Builder

Not necessarily if yu ou can get a copy of the uf logs after they have connected once to DS can shed more light on this, once there was an app that caused this too. So n number of possibilities. Logs can on tell what's happening..

isoutamo
SplunkTrust
SplunkTrust
And you have only one DS, not several behind LB?
0 Karma

Gene
Path Finder

Yes, correct.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
And your DS's splunk version is the highest or equal one than any other nodes UF+servers?

What "splunk btool deploymentclient list --debug" said? Are those values and places what you are expecting? And how about "splunk show deploy-poll" ?

I assume that telnet/curl from DC to DS:8089 (or what ever your mgmt port is) is working?

Those DCs are in same DC or behind VPN?

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...