Getting Data In

Why do my scripted inputs data get intermingled?

Mick
Splunk Employee
Splunk Employee

I have 2 scripted inputs running on the same interval -

[script://$SPLUNK_HOME/etc/apps/myNewApp/script1.sh]
interval = 60
sourcetype = script
source = stats
index = os

[script://$SPLUNK_HOME/etc/apps/myNewApp/script2.sh]
interval = 60
sourcetype = script
source = stats
index = os

Sometimes, the output from both scripts will appear in the same event, is it because they both run at the same time?

Lowell
Super Champion

Mick, can this same situation also happen for regular file inputs too?

For example, say there's a directory full of log files (written to concurrently by multiple processes), and the following inputs.conf entry is used:

[monitor://var/log/dump/dumpinfo*.log]
host = host.example.org
source = /var/log/dump/dumpinfo.log
sourcetype = dumpinfo

Since all files in this directory would be given the exact same host|source|sourcetype pair, does that mean it's possible for the content of these files to be intermixed when it gets into splunk? (I believe I've seen this happen before, but I haven't tried it again since the 3.4.x days)


BTW, sorry for asking a follow up question on your question. I was going to just add a comment, but figured that it would be better to fully explain the question, and there's just not enough room in a single comment...

0 Karma

Lowell
Super Champion

Thanks for the update. Perhaps I'll tests this on a test splunk instance when I get a chance.

0 Karma

Mick
Splunk Employee
Splunk Employee

No problem Lowell,

I don't think so, as I'm pretty sure the new tailing processor is smart enough to keep events from separate files apart. If you do see it happening, its a bug we'll want to fix

0 Karma

Mick
Splunk Employee
Splunk Employee

No, not targetted for any work in 4.1. It's really due to a known 'issue' rather than a bug. I'll raise with the Dev team to see if there's anything we can do in the post 4.1.x stream

0 Karma

Mick
Splunk Employee
Splunk Employee

No, it's nothing to do with the interval. A known issue in Splunk will see that both input streams has the same host, source and sourcetype values, so it gets confused and mixes up the streams.

An easy workaround is to change the 'source' value for one of the scripts, so that they can be easily distinguished.

jrodman
Splunk Employee
Splunk Employee

Well it seems unlikely that you would want the output of multiple scripts to be merged into a single event. From a flexibility perspective it seems non-awful to have the aggregator in charge of splitting out the events, but it's still confusing.

They should become part of the same stream of events, but be discrete.

gkanapathy
Splunk Employee
Splunk Employee

Is this really a bug? Considering that scripted inputs could be long-running scripts. In that case, I would want the lines from such scripts (e.g., maybe they do the same thing, but collect data from different users or databases) to be interleaved as they are produced by the script.

0 Karma