We were just finalizing a deployment of Splunk DB Connect 2.4 when 3.0 appeared so stopped and decided to start the deployment fresh with 3.0. Our challenge going into this was that we do not have the ability to install JAVA/JDK as an application on our Linux environment due to security restrictions. This was not a problem with DB Connect 2.4. I placed the Oracle JDK software in the DB Connect App local directory (/opt/splunk/etc/apps/splunk_app_db_connect/local/jdk1.8.0_121), and in the App setup provided the full path. Everything worked fine.
Proceeded with the Splunk DB Connect v 3.0 with clean slate. Removed the v2.4 app, and installed v3.0. Placed the JDK files in the same directory as with 2.4. Tried to "Save" which kicks off the start of the "Task Server"..... Continue to receive the error "Failed to start task server". Was not necessary with 2.4 that I have a JAVA_HOME or jdk in the PATH statement, but tried this anyway. Still receive same error.
The JRE path that worked with 2.4 = /opt/splunk/etc/apps/splunk_app_db_connect/local/jdk1.8.0_121
Tried this path, as well as appended with: jre/
Not seeing anything in Splunk logs, but don't know that I would. What am I missing?
Hi, here's the logging section: http://docs.splunk.com/Documentation/DBX/3.0.0/DeployDBX/Troubleshooting#DB_Connect_logging
If you're not able to figure it out from logs, I'd start by looking for java processes getting launched and the network ports that are getting used.
FYI and perhaps this will be helpful to others.
I was getting all the same errors and finding little of use in the log files.
In my case the problem was that something had deleted these files:
C:\Program Files\Splunk\etc\apps\splunk_app_db_connect\windows_x86_64\bin\dbxquery.exe C:\Program Files\Splunk\etc\apps\splunk_app_db_connect\windows_x86_64\bin\server.exe
Files of the same name still existed under:
...\windows_x86\bin
...\linux_x86\bin
...\linux_x86_64\bin
So we suspect some antivirus or similar must have deleted them for some reason.
As soon as I put them back DB Connect was fine.
I discovered two issues in my config and after fixing them, I can consistently restart the task server. I posted the details in an answer.
I was fighting this same problem until I stopped to look at the General settings page to give it some thought. There are two "servers" to configure on this page:
Task Server
Query Server
Each server needs it's own unique port number to listen on. But for some reason, both "servers" were configured to use the same port number - 9999. So these two java "server" processes were fighting over the same listener port. I simply changed the port of the Query server to 9998 and it started right up. Or so I thought...then the problem came back and I discovered a second issue.
My config also had a conflicting [splunktcp://9998] input. I discovered this by paying close attention to the output of the 'netstat -antp | grep 9998' command and noticed the process that was bound to the port was 'splunkd' instead of 'java'. When it is running correctly, both ports should be bound to a 'java' process. The 9998 input was not needed in my case, so I just commented it out and restarted. If you need to this port on an input, you'll need to change the port in the Task Server configuration.
These changes allowed the java process to bind with the port and the task server started right up. I'm also able to restart it consistently, which also seems to be a problem for others.
Restarting the splunk worked for me. For the latest version 3.1.3
I had the same problem. In my case I started with DB Connect v3. I added drivers and pushed it to my search cluster, but I didn't do the config until a while later. When I tried I got the same error, and also nothing in the logs.
In my case, I downloaded and installed the Java SDK and then specified the path in the DBX config. But it didn't work - until I restarted Splunk.
Perhaps the environment settings for the splunk process hadn't picked up ... some necessary parameter for the Java SDK. Restarting splunk started a new process with a fresh environment, so ....
I dunno. But it works now.
I too faced the same problem.
Uninstalled 2.4 and installed 3.0.2 and I get "Failed to start Task Server"
I had to restart Splunk and that fixed the issue.
But as per splunk doc , a restart of Splunk is not required to reflect the changes in the UI, So Im not understanding why a restart is required again.
Click Save to restart the Task Server's Java process. You do not need to restart Splunk Enterprise for changes on this page to take effect.
I've just had exactly the same issue, and a restart of splunk fixed it, not 100% sure why.
Both good answers... Using Oracle JDK. earlier today I finally got this working, kind of. Started tinkering with the ports. With v2.4 was using the default of 9998 (i believe). Same host, with v3.0 would only work with port 1025!. Randomly tried several ports up to 9998 and nothing worked (verified the ports were not in use). Something different about v3.0. As for the logs, no logs appeared until I successfully passed this stage, at least with v3.0. Never looked for the logs with v2.4 since,,,, it just worked. I will chock this up to being stuck in a rut and needed.
Thanks for the replies, both of them.
The system will reject any non-Oracle JVM, if it's open JDK it will not work however considering it worked with 2.4 most likely it should work with 3.0.
There are logs in $SPLUNK_HOME/var/log/splunk/splunk_app_db_connect_*
The logs there should give you a hint as to what has gone wrong.
Hi, here's the logging section: http://docs.splunk.com/Documentation/DBX/3.0.0/DeployDBX/Troubleshooting#DB_Connect_logging
If you're not able to figure it out from logs, I'd start by looking for java processes getting launched and the network ports that are getting used.
Same issue I have with Oracle SE 1.8.0 after migrating from 2.4.0 to 3.0.1.
Regardless of using any port 9998, 9999, 10000 (or random like 1025) mentioned reaction below, I only see this in the log dir. Nothing is running on those ports.
2017-03-01 15:12:59.549 +0100 52064@{servername}[main] ERROR io.dropwizard.cli.ServerCommand - Unable to start server, shutting down
java.lang.RuntimeException: java.net.BindException: Address already in use
at org.eclipse.jetty.setuid.SetUIDListener.lifeCycleStarting(SetUIDListener.java:213)
at org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:188)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:67)
at io.dropwizard.cli.ServerCommand.run(ServerCommand.java:53)
at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:44)
at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:85)
at io.dropwizard.cli.Cli.run(Cli.java:75)
at io.dropwizard.Application.run(Application.java:79)
at com.splunk.dbx.server.bootstrap.TaskServerStart.startTaskServer(TaskServerStart.java:97)
at com.splunk.dbx.server.bootstrap.TaskServerStart.streamEvents(TaskServerStart.java:59)
at com.splunk.modularinput.Script.run(Script.java:66)
at com.splunk.modularinput.Script.run(Script.java:44)
at com.splunk.dbx.server.bootstrap.TaskServerStart.main(TaskServerStart.java:108)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:317)
at org.eclipse.jetty.setuid.SetUIDListener.lifeCycleStarting(SetUIDListener.java:200)
... 12 common frames omitted
I restarted Splunk a bunch of times, but what eventually worked for me somehow was /stop/ Splunk and /start/ Splunk. It still failed after starting, but after hitting the Save button under Configuration > Settings > General a few times, then suddenly it started.
I got exactly the same problem after upgrading from 2.4.0 to 3.0.0. I just can't get the task server running. I have changed the port, tried different jdk/jre, it doesn't work.
XXING, my final solution was to simply reboot the server, and everything worked..... Felt like I was back on Windows 😞
your final solution did work for me too. I can't explain it and to be honest I don't want to now. fyi, we are using splunk 6.5.2 on CentOS7.
Interesting, so even netstat -an | grep port number fails to show anything prior to the reboot?
Same here indeed, lsof -i :9999 shows nothing like the netstat on 9999. How a reboot can work if a splunk stop and splunk start (no splunk process remaining before start) doesn't... I see a pid changing every about 5 seconds, it tries to (re)start the taskserver without success. Did a fresh app deployment (after tar/zip the 2.4.0 app and deleting directory) that worked right away. Sounds like some setting remains that conflicts.
Your mention of lsof -i, are you using AIX? I haven't seen that command used outside AIX before.
This does sound very strange, sorry I cannot help any further, it just works for me on Redhat Enterprise Linux 7.x on the standard port number...
Does the argument appear in /opt/splunk/etc/apps/splunk_app_db_connect/jars/server.vmopts ? Such as:
-Ddw.server.applicationConnectors[0].port=9998
When running the debugging mode I found that both the task server & splunk had to restart for some debugging, I'm wondering if that is potentially an issue here (I do realise that splunk should not need to be restarted but it does change things in my environment).
Yes, it gets updated in the file when saving the port number in the console. As 9998 is assigned to another connection on 2.4.0 it ran under 9999 without issues.
It still remains a big question what the use is of splunk_app_db_connect/default/dbx_task_server.yml - there's
server:
rootPath: /api/
applicationConnectors:
- type: https
port: 9999
Have put 9999 there instead of the default 9998. No effect, same issue. Is this overridden by the vmopts or should there be a local/dbx_task_server.yml with some settings? Only conf file found where something with 999* exists...