All Apps and Add-ons

Why is Splunk DB Connect 3.0 unable to start Task Server?

tlmayes
Contributor

We were just finalizing a deployment of Splunk DB Connect 2.4 when 3.0 appeared so stopped and decided to start the deployment fresh with 3.0. Our challenge going into this was that we do not have the ability to install JAVA/JDK as an application on our Linux environment due to security restrictions. This was not a problem with DB Connect 2.4. I placed the Oracle JDK software in the DB Connect App local directory (/opt/splunk/etc/apps/splunk_app_db_connect/local/jdk1.8.0_121), and in the App setup provided the full path. Everything worked fine.

Proceeded with the Splunk DB Connect v 3.0 with clean slate. Removed the v2.4 app, and installed v3.0. Placed the JDK files in the same directory as with 2.4. Tried to "Save" which kicks off the start of the "Task Server"..... Continue to receive the error "Failed to start task server". Was not necessary with 2.4 that I have a JAVA_HOME or jdk in the PATH statement, but tried this anyway. Still receive same error.

The JRE path that worked with 2.4 = /opt/splunk/etc/apps/splunk_app_db_connect/local/jdk1.8.0_121
Tried this path, as well as appended with: jre/

Not seeing anything in Splunk logs, but don't know that I would. What am I missing?

1 Solution

jcoates_splunk
Splunk Employee
Splunk Employee

FINAL SOLUTION WAS TO RESTART SPLUNK

Hi, here's the logging section: http://docs.splunk.com/Documentation/DBX/3.0.0/DeployDBX/Troubleshooting#DB_Connect_logging

If you're not able to figure it out from logs, I'd start by looking for java processes getting launched and the network ports that are getting used.

View solution in original post

KeithH
Path Finder

FYI and perhaps this will be helpful to others.

I was getting all the same errors and finding little of use in the log files.

In my case the problem was that something had deleted these files:

C:\Program Files\Splunk\etc\apps\splunk_app_db_connect\windows_x86_64\bin\dbxquery.exe C:\Program Files\Splunk\etc\apps\splunk_app_db_connect\windows_x86_64\bin\server.exe


Files of the same name still existed under:
...\windows_x86\bin
...\linux_x86\bin
...\linux_x86_64\bin
So we suspect some antivirus or similar must have deleted them for some reason.

As soon as I put them back DB Connect was fine.

0 Karma

_smp_
Builder

I discovered two issues in my config and after fixing them, I can consistently restart the task server. I posted the details in an answer.

0 Karma

_smp_
Builder

I was fighting this same problem until I stopped to look at the General settings page to give it some thought. There are two "servers" to configure on this page:
Task Server
Query Server

Each server needs it's own unique port number to listen on. But for some reason, both "servers" were configured to use the same port number - 9999. So these two java "server" processes were fighting over the same listener port. I simply changed the port of the Query server to 9998 and it started right up. Or so I thought...then the problem came back and I discovered a second issue.

My config also had a conflicting [splunktcp://9998] input. I discovered this by paying close attention to the output of the 'netstat -antp | grep 9998' command and noticed the process that was bound to the port was 'splunkd' instead of 'java'. When it is running correctly, both ports should be bound to a 'java' process. The 9998 input was not needed in my case, so I just commented it out and restarted. If you need to this port on an input, you'll need to change the port in the Task Server configuration.

These changes allowed the java process to bind with the port and the task server started right up. I'm also able to restart it consistently, which also seems to be a problem for others.

0 Karma

AnilPujar
Path Finder

Restarting the splunk worked for me. For the latest version 3.1.3

0 Karma

Ranazar
Path Finder

I had the same problem. In my case I started with DB Connect v3. I added drivers and pushed it to my search cluster, but I didn't do the config until a while later. When I tried I got the same error, and also nothing in the logs.

In my case, I downloaded and installed the Java SDK and then specified the path in the DBX config. But it didn't work - until I restarted Splunk.

Perhaps the environment settings for the splunk process hadn't picked up ... some necessary parameter for the Java SDK. Restarting splunk started a new process with a fresh environment, so ....

I dunno. But it works now.

Harishma
Communicator

I too faced the same problem.

Uninstalled 2.4 and installed 3.0.2 and I get "Failed to start Task Server"

I had to restart Splunk and that fixed the issue.

But as per splunk doc , a restart of Splunk is not required to reflect the changes in the UI, So Im not understanding why a restart is required again.

Click Save to restart the Task Server's Java process. You do not need to restart Splunk Enterprise for changes on this page to take effect.

link text

stephenoleary
Explorer

I've just had exactly the same issue, and a restart of splunk fixed it, not 100% sure why.

0 Karma

tlmayes
Contributor

Both good answers... Using Oracle JDK. earlier today I finally got this working, kind of. Started tinkering with the ports. With v2.4 was using the default of 9998 (i believe). Same host, with v3.0 would only work with port 1025!. Randomly tried several ports up to 9998 and nothing worked (verified the ports were not in use). Something different about v3.0. As for the logs, no logs appeared until I successfully passed this stage, at least with v3.0. Never looked for the logs with v2.4 since,,,, it just worked. I will chock this up to being stuck in a rut and needed.

Thanks for the replies, both of them.

0 Karma

gjanders
SplunkTrust
SplunkTrust

The system will reject any non-Oracle JVM, if it's open JDK it will not work however considering it worked with 2.4 most likely it should work with 3.0.

There are logs in $SPLUNK_HOME/var/log/splunk/splunk_app_db_connect_*
The logs there should give you a hint as to what has gone wrong.

jcoates_splunk
Splunk Employee
Splunk Employee

FINAL SOLUTION WAS TO RESTART SPLUNK

Hi, here's the logging section: http://docs.splunk.com/Documentation/DBX/3.0.0/DeployDBX/Troubleshooting#DB_Connect_logging

If you're not able to figure it out from logs, I'd start by looking for java processes getting launched and the network ports that are getting used.

tweaktubbie
Communicator

Same issue I have with Oracle SE 1.8.0 after migrating from 2.4.0 to 3.0.1.
Regardless of using any port 9998, 9999, 10000 (or random like 1025) mentioned reaction below, I only see this in the log dir. Nothing is running on those ports.

2017-03-01 15:12:59.549 +0100 52064@{servername}[main] ERROR io.dropwizard.cli.ServerCommand - Unable to start server, shutting down
java.lang.RuntimeException: java.net.BindException: Address already in use
   at org.eclipse.jetty.setuid.SetUIDListener.lifeCycleStarting(SetUIDListener.java:213)
   at org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:188)
   at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:67)
   at io.dropwizard.cli.ServerCommand.run(ServerCommand.java:53)
   at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:44)
   at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:85)
   at io.dropwizard.cli.Cli.run(Cli.java:75)
   at io.dropwizard.Application.run(Application.java:79)
   at com.splunk.dbx.server.bootstrap.TaskServerStart.startTaskServer(TaskServerStart.java:97)
   at com.splunk.dbx.server.bootstrap.TaskServerStart.streamEvents(TaskServerStart.java:59)
   at com.splunk.modularinput.Script.run(Script.java:66)
   at com.splunk.modularinput.Script.run(Script.java:44)
   at com.splunk.dbx.server.bootstrap.TaskServerStart.main(TaskServerStart.java:108)
Caused by: java.net.BindException: Address already in use
   at sun.nio.ch.Net.bind0(Native Method)
   at sun.nio.ch.Net.bind(Net.java:433)
   at sun.nio.ch.Net.bind(Net.java:425)
   at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
   at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:317)
   at org.eclipse.jetty.setuid.SetUIDListener.lifeCycleStarting(SetUIDListener.java:200)
   ... 12 common frames omitted
0 Karma

_smp_
Builder

I restarted Splunk a bunch of times, but what eventually worked for me somehow was /stop/ Splunk and /start/ Splunk. It still failed after starting, but after hitting the Save button under Configuration > Settings > General a few times, then suddenly it started.

0 Karma

xxing
Explorer

I got exactly the same problem after upgrading from 2.4.0 to 3.0.0. I just can't get the task server running. I have changed the port, tried different jdk/jre, it doesn't work.

0 Karma

tlmayes
Contributor

XXING, my final solution was to simply reboot the server, and everything worked..... Felt like I was back on Windows 😞

0 Karma

xxing
Explorer

your final solution did work for me too. I can't explain it and to be honest I don't want to now. fyi, we are using splunk 6.5.2 on CentOS7.

0 Karma

gjanders
SplunkTrust
SplunkTrust

Interesting, so even netstat -an | grep port number fails to show anything prior to the reboot?

0 Karma

tweaktubbie
Communicator

Same here indeed, lsof -i :9999 shows nothing like the netstat on 9999. How a reboot can work if a splunk stop and splunk start (no splunk process remaining before start) doesn't... I see a pid changing every about 5 seconds, it tries to (re)start the taskserver without success. Did a fresh app deployment (after tar/zip the 2.4.0 app and deleting directory) that worked right away. Sounds like some setting remains that conflicts.

0 Karma

gjanders
SplunkTrust
SplunkTrust

Your mention of lsof -i, are you using AIX? I haven't seen that command used outside AIX before.
This does sound very strange, sorry I cannot help any further, it just works for me on Redhat Enterprise Linux 7.x on the standard port number...

0 Karma

gjanders
SplunkTrust
SplunkTrust

Does the argument appear in /opt/splunk/etc/apps/splunk_app_db_connect/jars/server.vmopts ? Such as:
-Ddw.server.applicationConnectors[0].port=9998

When running the debugging mode I found that both the task server & splunk had to restart for some debugging, I'm wondering if that is potentially an issue here (I do realise that splunk should not need to be restarted but it does change things in my environment).

0 Karma

tweaktubbie
Communicator

Yes, it gets updated in the file when saving the port number in the console. As 9998 is assigned to another connection on 2.4.0 it ran under 9999 without issues.

It still remains a big question what the use is of splunk_app_db_connect/default/dbx_task_server.yml - there's

server:
  rootPath: /api/
  applicationConnectors:
    - type: https
      port: 9999

Have put 9999 there instead of the default 9998. No effect, same issue. Is this overridden by the vmopts or should there be a local/dbx_task_server.yml with some settings? Only conf file found where something with 999* exists...

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...