Solved: Why am I seeing a high number of indexer connectio...

flle · ‎03-11-2015

Hi,

I am currently experiencing strange indexer to Universal Forwarder network connection requests on port 8089.

I installed a Universal Forwarder on a server which was previously acting as Data Collection Node (DCN) for the Splunk VMWare App. The DCN was decommissioned and another server was brought online under the same IP, now with a UF installed.
As soon as the UF went online I (continuously) noticed tons of error messages in splunkd.log:

03-11-2015 22:54:51.612 +0100 ERROR AuthenticationManagerSplunk - Login failed. Incorrect login for user: admin
03-11-2015 22:54:51.638 +0100 ERROR AuthenticationManagerSplunk - Login failed. Incorrect login for user: admin
03-11-2015 22:54:51.645 +0100 ERROR AuthenticationManagerSplunk - Login failed. Incorrect login for user: admin

I first suspected a local issue but then saw corresponding errors in splunkd_access.log:

X.X.0.39 - - [11/Mar/2015:22:54:51.630 +0100] "POST /services/auth/login HTTP/1.0" 401 129 - - - 8ms
X.X.0.18 - - [11/Mar/2015:22:54:51.631 +0100] "POST /services/auth/login HTTP/1.0" 401 129 - - - 14ms
X.X.0.18 - - [11/Mar/2015:22:54:51.666 +0100] "POST /services/auth/login HTTP/1.0" 401 129 - - - 11ms

The Systems continuously trying to connect multiple times per second are my indexers. This goes on even when the UF is not running. I have not found a reason why the indexers are doing this as this does not happen with any other of the hundreds of active forwarders. I suspected something in conjunction with the servers DCN role before, but neither on the indexers nor on the Search Head (VMWare App config) did I find any reference to the decommissioned DCN.

Ideas anyone?

Thanks & best regards!

tfletcher_splun · ‎03-16-2015

You should never have been experiencing connections from your indexers to your DCN's. That would indicate a configuration file called ta_vmware_collection.conf and a file named hydra_node.conf existed in the local directory of Splunk_TA_vmware package installed on your indexers. That also means there is an inputs.conf on your indexers in your Splunk_TA_vmware with ta_vmware_collection_scheduler turned on.

TA vmare's collection scheduler modular input should only be run from one machine, typically the search head. If you are in a search head pooled environment you would have had to move the scheduler process off the pool to avoid duplicated work but not onto to the indexers. You normally just use one of the DCN's.

The best course for remediation is to simply remove the inputs, ta_vmware_collection, and hydra_node conf files from all your indexers. Your problems will then go away as soon as configuration is refreshed or splunk is restarted.

View solution in original post

tfletcher_splun · ‎03-16-2015

You should never have been experiencing connections from your indexers to your DCN's. That would indicate a configuration file called ta_vmware_collection.conf and a file named hydra_node.conf existed in the local directory of Splunk_TA_vmware package installed on your indexers. That also means there is an inputs.conf on your indexers in your Splunk_TA_vmware with ta_vmware_collection_scheduler turned on.

TA vmare's collection scheduler modular input should only be run from one machine, typically the search head. If you are in a search head pooled environment you would have had to move the scheduler process off the pool to avoid duplicated work but not onto to the indexers. You normally just use one of the DCN's.

The best course for remediation is to simply remove the inputs, ta_vmware_collection, and hydra_node conf files from all your indexers. Your problems will then go away as soon as configuration is refreshed or splunk is restarted.

tfletcher_splun · ‎03-17-2015

Your root cause is likely a result of the TA vmware hierarchy agent. This is a scripted input located in Splunk_TA_vmware. It is off by default and only turned on on the scheduler node. My assumption would be that who ever configured and then pushed your TA to your indexers with the hydra_node and collection configuration files is also responsible for this. There is likely an inputs.conf stanza in the TA that looks something like this:

[script://$SPLUNK_HOME/etc/apps/Splunk_TA_vmware/bin/ta_vmware_hierarchy_agent.py]
disabled = 0

You can delete the stanza and the root cause of the issue will be gone.

mgildenhorn_spl · ‎03-12-2015

I'm sure you checked this, but Is the VMware App scheduler still on? That would be the main thing that would continuously talk to the DCN machine.

mgildenhorn_spl · ‎03-12-2015

When you removed the DCN, did you wipe out the splunk directory that it was in and create a brand new directory for the UF? Also, did you restart the SH after you removed the DCN from the scheduler configuration?

flle · ‎03-12-2015

Yes, directory /opt/splunk on DCN was completely deleted. UF is installed in /opt/splunkforwarder.
And yes, SH was restarted after DCN removal.

flle · ‎03-12-2015

Yes it is still on and has to be. The VMWare App is still in use and there are two other DCNs active. It's just that apparently the old DCN ist still included in the scheduling and I can't figure out where or why.

flle · ‎03-12-2015

Hi,

Thanks for the answers so far but did not bring me further. I tried btool before without results.
However, I could pinpoint it down to be related to the VMWare app. The Network Connections from the indexers are initiated by a python process running the ta_vmware_collection_sheduler:

splunk   11012 10886 15 Mar11 ?        02:36:00 python /opt/splunk/etc/apps/Splunk_TA_vmware/bin/ta_vmware_collection_scheduler.py

So I continued on the Search Head where the VMWare App is installed as this is where the scheduling happens. I searched all the VMware related apps and TAs for IP or hostname of the old DCN but did not find a config file where it still was present. In the relevant DCN config file (/opt/splunk/etc/apps/Splunk_TA_vmware/local/hydra_node.conf) only the correct Hosts are still present.

In the logs of the seach head I could trace the deletion of the DCN:

web_access.log:127.0.0.1 - xxxxxxx [09/Mar/2015:13:24:20.918 +0100] "DELETE /en-US/custom/splunk_for_vmware/splunk_for_vmware_setup/splunk_for_vmware/delete_collection_node/https:%2F%2Fdcn-hostname:8089 HTTP/1.1" 200 - "https://searchhead:8000/en-US/custom/splunk_for_vmware/splunk_for_vmware_setup/splunk_for_vmware/show_collection_setup" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36" - 54fd90f4eb7fbd79676150 887ms

But apparently, it still is used by the Scheduler worker processes as acollection node.
So maybe someone with expert Knowledge of the VMWare app can give a further hint on where the Problem could be.

UPDATE (2015-03-16):
As SA_Hydra is responsible for the data collection sheduling, we disabled it (-> move to disabled apps) on the indexers. That stopped the connections from the indexers to the old CDN.
However, it had side effects. Now the indexers are throwing errors:

Search peer index1.example.net has the following message: Unable to initialize modular input "ta_vmware_collection_worker" defined inside the app "Splunk_TA_vmware": Introspecting scheme=ta_vmware_collection_worker: script running failed (exited with code 1).

After inspecting /opt/splunk/etc/apps/Splunk_TA_vmware/local I noticed config files which I then would also not expect on a indexer:

-rw-------. 1 splunk splunk 979 Mar  5 16:33 app.conf
-rw-------. 1 splunk splunk 357 Mar  6 09:07 hydra_node.conf
-rw-rw-r--. 1 splunk splunk 238 Mar  6 11:03 indexes.conf
-rw-------. 1 splunk splunk 150 Mar 12 15:20 inputs.conf
-rw-------. 1 splunk splunk 737 Mar  6 09:07 ta_vmware_collection.conf
-rw-------. 1 splunk splunk 177 Feb 17 09:20 ta_vmware_syslog_forwarder.conf
-rw-------. 1 splunk splunk 333 Mar  6 09:07 vcenter_forwarder.conf

I find this config identical on my Search Head, where the VMWare Collection Config is done via Splunk Web and which then controls the CDNs. So no idea how it came to the indexers...
As far as distribution of VMWare app components to Splunk servers goes, we applied the matrix here:
http://docs.splunk.com/Documentation/VMW/3.1.4/Configuration/Componentreference

According to this, SA_Hydra is necessary on the indexers.
†† Install SA-Utils and SA-Hydra on the Splunk indexers to stop modular input introspection from failing. This is a workaround to a known issue. These components do not affect the operation of your indexers.
But apparently, it does somehow affect the operations of the indexers.

So the next step would be to disable the hydra/collection config in Splunk_TA_vmware/local/ on the indexers. But I still wonder, how the config got there in the first place or if, why and which VMWare app components are really, REALLY necessary on the indexers.

Any ideas or definite answers there?

esix_splunk · ‎03-12-2015

This seems to me like there has been some app misconfigured that is pointing to the UF, perhaps either as a Deployment Server, or more likely some configuration on the indexers exists that are trying to contact that UF. Perhaps the vmware collector configuration was set or deployed to the indexers?

On one of your indexers, I would btool and see what you can see.

$splunk_home/bin/splunk btool outputs list | grep ufdnsname or ip address

I'd go through the various configs, e.g., inputs, outputs, transforms, props..

See if you can find anything that is trying to make this connection.

jeffland · ‎03-12-2015

Isn't port 8089 typically the management port? Maybe there is some configuration that your indexer wants to push to the forwarder.

Why am I seeing a high number of indexer connection requests to universal forwarder port 8089?

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...