Getting Data In
Highlighted

"ERROR: The mgmt port [8089] is already bound" prevents restarting Splunk

Contributor

On a 4.1.2 Windows forwarder, we have a .path scripted input pointing to IBM WebSphere's wsadmin command-line shell. The wsadmin process launches another process (java.exe running a jython script that we passed into wsadmin). That java process ("grandchild" from Splunk's perspective) runs a custom Jython script we wrote, pipes the output of that script back to wsadmin ("child" from Splunk's perspective) which in turn outputs that text back to Splunk. So far so good.

Here's the problem: when we stop Splunk, the "child" process is correctly killed by Splunk, but the "grandchild" process lives on. I assume wsadmin is not launching the java process in a way which allows Splunk to kill the whole process tree. (Unfortunately, we can't change how wsadmin launches processes, since it's IBM's code.)

Making matters worse, the orphan java process hanging around prevents restarting Splunk! We get this error when trying to restart Splunk:

ERROR: The mgmt port [8089] is already bound. Splunk needs to use this port. Would you like to change ports? [y/n]:

If we manually kill the orphan java.exe process, the error above doesn't happen. We can hack around the issue by having the grandchild process commit suicide when splunkd exits. But that requires a separate thread and/or frequent calls to watch for splunk exiting-- and allows a race condition where Splunk fails to restart before the grandchild detects that splunkd is gone.

Any idea why I'm getting the error above, given that the process Splunk was previously talking to has been killed?

Tags (2)
0 Karma
Highlighted

Re: "ERROR: The mgmt port [8089] is already bound" prevents restarting Splunk

Splunk Employee
Splunk Employee

So, splunkd.exe launches a Powershell script, which launches a JVM, which launches a Jython process. All 3 "launchees" inherit splunkd.exe's filehandles, which include the handle to socket 127.0.0.1:8089. Is how OSes work.

Since you're in hackland already, consider pskill from Microsoft's Sysinternals (to kill the orphan process). It doesn't come prepackaged with Windows Server 20XX, but it's officially supported by Microsoft.

Also, instead of child actively checking for parent being alive, the child could check for modtime of a tempfile being sufficiently recent (the parent periodically updates the file).

0 Karma