Deployment Architecture

Scripted input without a shell?

Super Champion

Is it possible to create a scripted input that is launched directly from splunkd and not from a shell? I've tried shell script, a python script, and a .path file, and they all seem to be wrapped by a "/bin/sh -c <MY_COMMAND>" wrapper shell.

I have long-running scripted input, and the process is not being shutdown when splunkd restarts, which I think is due to the extra sh process not passing down the kill signal to my process.


Update / Additional info:

I know that splunkd is stopping the shell whenever splunkd is shutdown or whenever the scripted-input is disabled. (Note: To save time during testing, I've been enabling and disabling my scripted input stanza in inputs.conf, then issuing the the following refresh link: http://mysplunkserver:8000/en-US/debug/refresh?entity=admin%2Fscript, which has been working like a charm.) Whenever the input is disabled (or splunkd shutdown), the shell wrapper process goes away, but the child process (aka, my scripted input program) continues to run. But instead of my process being a grandchild of splunkd, now it's directly under process 1 (init).

Here are two examples showing the processes running on my system (output generated by pstree -A -p).

Example 1: This shows my scripted input when it's enabled. My scripted input process is pid 4177, with several threads.

init(1)-+
        |-splunkd(2642)-+-splunkd(2643)-+-sh(4176)---java(4177)-+-{java}(4185)
        |               |               |                       |-{java}(4186)
        |               |               |                       |-{java}(4187)
        |               |               |                       |-{java}(4188)
        |               |               |                       |-{java}(4189)
        |               |               |                       |-{java}(4191)
        |               |               |                       |-{java}(4194)
        |               |               |                       |-{java}(4195)
        |               |               |                       |-{java}(4201)
        |               |               |                       |-{java}(4202)
        |               |               |                       |-{java}(4203)
        |               |               |                       |-{java}(4204)
        |               |               |                       |-{java}(4205)
        |               |               |                       |-{java}(4207)
        |               |               |                       `-{java}(4211)

Example 2: I then disabled by input (disabled=1 in inputs.conf), then refreshed the "admin/script" entities, and now my process tree looks as follows: (Note that the "java" process is now owned by init, and the wrapper shell (4176) is now gone.

init(1)-+
        |-java(4177)-+-{java}(4185)
        |            |-{java}(4186)
        |            |-{java}(4187)
        |            |-{java}(4188)
        |            |-{java}(4189)
        |            |-{java}(4191)
        |            |-{java}(4194)
        |            |-{java}(4195)
        |            |-{java}(4201)
        |            |-{java}(4202)
        |            |-{java}(4203)
        |            |-{java}(4204)
        |            |-{java}(4205)
        |            |-{java}(4207)
        |            `-{java}(4211)
        |-splunkd(2642)-+-splunkd(2643)

The problem seems to be that the wrapper shell (/bin/sh) is simply not passing on the kill request.

Note: As shown above, I'm using my own wrapper script to setup the environment and launching the java executable using exec to prevent an additional shell layer in the mix. I've messed around with using traps and such (when I wasn't using exec, of course), but ultimately if the parent process (aka the /bin/sh wrapper shell) doesn't pass down the signal, there's nothing to trap. My only other option is implementing some kind of polling mechanism to see if my parent process is dead. So once again, I'm back to: How do I turn off that annoying wrapper shell and keep things simple?


I'm running Splunk 4.1.8 on Ubuntu 8.04 (32 bit) and /bin/sh is currently using dash (which is the Ubuntu default)

Influencer

Here's a python script I'm using in various scenarios where I'm invoking a subprocess that should be killed when the script is disabled or splunk is stopped:

import sys
import signal
from subprocess import *

process = Popen(...)

def cleanup(s, f):
    try:
        process.terminate()
    except:
        sys.exit(1)

signal.signal(signal.SIGTERM, cleanup)
(out,err) = process.communicate()

Splunk sends a SIGTERM to the script and the callback attached to the signal terminates the subprocess.

if os.uname()[0] == 'Linux':
    from threading import Thread
    import time

    class PPIDWatcher(Thread):
        def __init__(self): super(PPIDWatcher, self).__init__()
        def run(self):
            ppid = os.getppid()
            while True:
                time.sleep(1)
                try:
                    os.kill(ppid,0)
                except:
                    cleanup()
    PPIDWatcher().start()
0 Karma

Influencer

You're right. I didn't see that I've built a rather dirty workaround for Linux. I'll add it to the post, but I think it's something you already had in mind.

0 Karma

Super Champion

Nope. Splunk is doing that itself, and I can't make it NOT wrap the process in a shell. Hence I posted this question.

0 Karma

Influencer

And it works on at least OSX as well...

0 Karma

Influencer

It's running on Ubuntu. Why are you wrapping a python script with a shell script?

0 Karma

Super Champion

I've tried this approach, but the problem I had was that the "splunkd" process was killing the shell script that was wrapping the python script; and the signal never got passed down from that shell to the python process. It could be a version/OS thing. What OS are you using, and which version of Splunk?

0 Karma

Splunk Employee
Splunk Employee

The processes not being killed is in fact due to Splunk's own behavior, and will not change regardless. While you can call any executable directly from a scripted input, they'll all behave that way. They are launched with nohup on Unix, and Windows inherently doesn't kill processes. It's pretty much a bug I think, since I see absolutely no use for the current behavior, but maybe not.

As a possibly complicated workaround, I would recommend that any long-running custom script should occasionally exit maybe every 10 minutes or 30 minutes, or at some convenient time for the script, and then be scheduled to restart 1 second or zero seconds after exit. Or perhaps every so often see what it's stdout is connected to and exit if there's nothing there.

You should probably file an ER with Splunk for a parameter for each scripted input stanza to have Splunkd kill child processes/processes launched by the script on exit. Or just file it as a bug.

0 Karma

SplunkTrust
SplunkTrust

Killing a child process (and all of its children) is not necessarily easy in all cases in unix. If splunk starts the scripted input in its own process group, then it would be easier to kill all processes within the group. Similar to gkanapathy's advice, your scripted input could occasionally (top of a loop?) check its parent pid, and exit cleanly if the parent is init (pid 1)

0 Karma

Super Champion

Out of curiosity, does your /bin/sh point to dash on your Ubuntu box?

0 Karma

Splunk Employee
Splunk Employee

Agree that the current behavior makes no sense. We are seeing the same issue in the VMware App. Our scripted input's child process doesn't get shut down when Splunkd shuts down and the /bin/sh exits. We have found that it only happens on Ubuntu - i.e. it is fine on CentOS - so we're going to file it as a bug.

0 Karma

Super Champion

I know your the splunk guru here, but I don't think you're completely right on this one. Splunk does appear to kill the wrapping shell process when the scripted input is disabled. I've added some additional info to the original question.

0 Karma