Getting Data In

Batch input doesn't honor LINE_BREAKER settings

ziegfried
Influencer

I've used the var/spool/splunk directory to have Spunk index the output of some scripts. The files are moved there once the script completes. I've set the destination index, source and sourcetype using the approach here: http://www.splunk.com/base/Documentation/latest/Data/Assignmetadatatoeventsdynamically

***SPLUNK*** index=myindex sourcetype=mysourcetpe source=foo

The events are getting into the correct index with the correct metadata, but the line-breaking settings of the sourcetype seem to be ignored by Splunk. Settings for the sourcetype:

[mysourcetpe]
CHARSET=UTF-8
SHOULD_LINEMERGE = false
LINE_BREAKER = (\v+--end-of-event--\v*)

Instead of using the configured LINE_BREAKER, all lines are beeing indexed as seperate events. So it seems splunk is using something like ([\r\n]+) instead.

When using a [script://...] input, the line breaking works as expected. What I've tried so far, is to use a seperate batch input and setting the queue to parsingQueue. Unfortunately this doesn't change the behavior.

[batch://$SPLUNK_HOME\var\spool\foo]
crcSalt = <SOURCE>
move_policy=sinkhole
queue = parsingQueue

Any idea to get Splunk to do the line breaking correctly?

0 Karma
1 Solution

Lowell
Super Champion

I run into a number of issues like this before, and this is my theory: Splunk use the dynamic input header to set the source/souretype/host/index of the events being processed (as you'd expect), but it doesn't do the normal props.conf processing based on the newly assigned source/source/host; it simply processes it based on whatever props rules the input would have used if no dynamic input header existed.

I'm not sure if I've explained this well. Here's another example. You can use a transformer to rewrite the "source", and splunk will index your event with whatever new source you've assigned in the transformer. When you assign a new "source", splunk does NOT go back through all of your props.conf files looking to see if the new source value matches any stanzas--it simply uses the original props.conf rules. I think the dynamic input header has a similar limitation.


One thing I would try is to give your spool files a unique file pattern. (Like say, they all contain "MY_SOURCETYPE" in the middle of the filename.) Then setup a source pattern matting rule to match that source pattern in your spool directory, and use that to assign "mysource". If you are trying to do this with multiple sourcetypes, then this get's more tricky.

For example:

props.conf

[source::...MY_SOURCETYPE.*]
sourcetype = mysourcetpe

BTW, if you are trying to pass generic (multiline) messages into splunk, then you may find the following helpful. I've attempted to so something similar, and this is the best solution I've come up with: (This example talks about sending in events over TCP, but I use the same sourcetype for file-based input as well, and it works well there too.)

http://splunk-base.splunk.com/answers/7494/issues-creating-a-gateway-to-create-splunk-events-from-an...

View solution in original post

Marinus
Communicator

I think it's because it's not reading the header.
set the HEADER_MODE in props.conf

[source::/tmp/splunk/var/spool/splunk]
HEADER_MODE = always
0 Karma

Lowell
Super Champion

I was actually talking about 4.1, before that setting existed. (My understanding was that HEADER_MODE=always was the default in 4.1 and earlier.)

0 Karma

Lowell
Super Champion

I run into a number of issues like this before, and this is my theory: Splunk use the dynamic input header to set the source/souretype/host/index of the events being processed (as you'd expect), but it doesn't do the normal props.conf processing based on the newly assigned source/source/host; it simply processes it based on whatever props rules the input would have used if no dynamic input header existed.

I'm not sure if I've explained this well. Here's another example. You can use a transformer to rewrite the "source", and splunk will index your event with whatever new source you've assigned in the transformer. When you assign a new "source", splunk does NOT go back through all of your props.conf files looking to see if the new source value matches any stanzas--it simply uses the original props.conf rules. I think the dynamic input header has a similar limitation.


One thing I would try is to give your spool files a unique file pattern. (Like say, they all contain "MY_SOURCETYPE" in the middle of the filename.) Then setup a source pattern matting rule to match that source pattern in your spool directory, and use that to assign "mysource". If you are trying to do this with multiple sourcetypes, then this get's more tricky.

For example:

props.conf

[source::...MY_SOURCETYPE.*]
sourcetype = mysourcetpe

BTW, if you are trying to pass generic (multiline) messages into splunk, then you may find the following helpful. I've attempted to so something similar, and this is the best solution I've come up with: (This example talks about sending in events over TCP, but I use the same sourcetype for file-based input as well, and it works well there too.)

http://splunk-base.splunk.com/answers/7494/issues-creating-a-gateway-to-create-splunk-events-from-an...

View solution in original post

ziegfried
Influencer

Thx, that sounds reasonable 🙂

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!