All Apps and Add-ons

Monitoring of Java Virtual Machines with JMX stopped working on four virtual machines

thirusama
Path Finder

JMX monitoring stopped working on 4 of our VMs, where as the other servers (around 100) are still working. There was an upgrade of OS on all of these machines along with Java upgrade. Nothing seem different from working ones to non-working ones.

We are using config file & this is how our inputs.conf looks like

[jmx://jmx]
config_file = config.xml
_TCP_ROUTING = myindexset
polling_frequency = 60
sourcetype = jmx
index = my_index_jmx
disabled = 0
interval = 60

Below are the errors/messages, we are getting. Did anyone face similar issue?

Taking one non-working host as as example, it stopped receiving data since "9/30/17 1:25:11.542 AM".
This is from "/opt/splunkforwarder/var/log/splunk/jmx.log"

 2017-09-30 01:25:49,924 - com.splunk.modinput.ModularInput -159035 [Thread-1] ERROR  - Probing socket connection to SplunkD failed.Eith
    er SplunkD has exited ,or if not,  check that your DNS configuration is resolving your system's hostname (127.0.0.1) correctly : Connection refused

2017-09-30 01:25:57,735 - com.splunk.modinput.ModularInput -166846 [main] ERROR  - Error executing modular input : Connection refused : java.lang.RuntimeException: Connection refused
        at com.splunk.HttpService.send(Unknown Source)
        at com.splunk.Service.send(Unknown Source)
        at com.splunk.HttpService.get(Unknown Source)
        at com.splunk.ResourceCollection.list(Unknown Source)
        at com.splunk.ResourceCollection.refresh(Unknown Source)
        at com.splunk.ResourceCollection.refresh(Unknown Source)
        at com.splunk.Resource.validate(Unknown Source)
        at com.splunk.ResourceCollection.validate(Unknown Source)
        at com.splunk.ResourceCollection.values(Unknown Source)
        at com.splunk.jmx.InfoManager.getAccounts(Unknown Source)
        at com.splunk.jmx.JMXModularInputV3.doRun(Unknown Source)
        at com.splunk.modinput.ModularInput.init(Unknown Source)
        at com.splunk.jmx.JMXModularInputV3.main(Unknown Source)
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:625)
        at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:160)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
        at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
        at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:933)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
        ... 13 more

Below is from splunkd.log

09-30-2017 01:25:44.479 -0700 INFO  WatchedFile - Will use tracking rule=modtime for file='/etc/alternatives/java_sdk_oracle/lib/missioncontrol/plugins/com.jrockit.mc.rjmx_5.5.1.172852/plugin.properties'.

09-30-2017 01:25:57.735 -0700 ERROR ExecProcessor - message from "python /opt/splunkforwarder/etc/apps/Splunk_TA_jmx/bin/jmx.py" Error executing modular input : Connection refused : java.lang.RuntimeException: Connection refused
09-30-2017 01:25:57.735 -0700 ERROR ExecProcessor - message from "python /opt/splunkforwarder/etc/apps/Splunk_TA_jmx/bin/jmx.py"
       at com.splunk.HttpService.send(Unknown Source)
09-30-2017 01:25:57.735 -0700 ERROR ExecProcessor - message from "python /opt/splunkforwarder/etc/apps/Splunk_TA_jmx/bin/jmx.py"
       at com.splunk.Service.send(Unknown Source)
09-30-2017 01:25:57.735 -0700 ERROR ExecProcessor - message from "python /opt/splunkforwarder/etc/apps/Splunk_TA_jmx/bin/jmx.py"
       at com.splunk.HttpService.get(Unknown Source)
09-30-2017 01:25:57.735 -0700 ERROR ExecProcessor - message from "python /opt/splunkforwarder/etc/apps/Splunk_TA_jmx/bin/jmx.py"
       at com.splunk.ResourceCollection.list(Unknown Source)

java -version on both WORKING & NON-WORKING VM
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

We tried to compare everything, tried re-push the app from deployment server. No luck.

A few things we validated on non-working VM
1) port is open to the localhost
2) port is configured in JVM/JMX
3) JMX metrics are coming fine in the JConsole

Please let us know if anyone faced similar issue and were able to fix.

0 Karma
1 Solution

thirusama
Path Finder

We found out the root cause for this. Looks like Splunk_TA_jmx app requires port 8089 to be enabled. In one of our server.conf, below configuration is enabled. After commenting this out, JMX data has started coming in. We had to also install latest version of app (3.2.0).

[httpServer]
disableDefaultPort = true

We found this bu running below command on the working and non-working nodes.
sudo netstat -tulpn | grep 8089

View solution in original post

0 Karma

thirusama
Path Finder

We found out the root cause for this. Looks like Splunk_TA_jmx app requires port 8089 to be enabled. In one of our server.conf, below configuration is enabled. After commenting this out, JMX data has started coming in. We had to also install latest version of app (3.2.0).

[httpServer]
disableDefaultPort = true

We found this bu running below command on the working and non-working nodes.
sudo netstat -tulpn | grep 8089

View solution in original post

0 Karma

thirusama
Path Finder

Does anyone know anything about this? We are still getting these errors on some of the Universal Forwarders

0 Karma

thirusama
Path Finder

Any help on this is highly appreciated.

0 Karma