Getting Data In

dealing with sporadic linebreaks in my data when it comes in by tcp input

sideview
SplunkTrust
SplunkTrust

When I'm sending in data over TCP, once in a blue moon Splunk will split one of the events into two parts, so I get the first portion of the text in one event, and the second in another.

Obviously this causes a lot of problems. As a guess I thought maybe Splunk might be able to tell the difference between EOF and mere linebreaks or something, so I tried setting various explicit LINE_BREAKER keys in the props stanza

I tried the following three values (separately of course) but none worked.

LINE_BREAKER=[\n]+
LINE_BREAKER=[\r\n]+
LINE_BREAKER=(\x00)<\d+>

In fact they all make matters worse in that they cause my events to get indexed multiline, with 57 lines per event. And that's even though I have SHOULD_LINEMERGE=False in the stanza.

The third LINE_BREAKER value btw I got from http://answers.splunk.com/questions/603/juniper-netscreen-tcp-syslog-messages-not-breaking-properly which seemed to have solved the problem over there.

So Im definitely doing at least one thing wrong. 😃

Is it just that the sending process is responsible for aligning it's TCP packets with linebreaks? Is my network just way flakier than a normal network should be?

Is it possible to make this problem go away with some key that tells the tcp input to be a little patient and wait a few seconds somehow?

(btw this is a 64-bit splunk running on windows 7)

Tags (2)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

What is making the TCP connection? Splunk will break at the close of a TCP stream. Individual packets within a connection/stream are not broken, but it seems likely to me that your TCP connection may not be persistent and may in fact be closing and reopening a new one.

Network flakiness is not likely to be the problem (unless, say, it's forcibly terminating connections), as TCP should be able to handle even pretty severe packet loss at the IP level. What is more likely is an odd way that the client is creating or managing the TCP socket/stream/connection.

View solution in original post

bkumarm
Contributor

I faced the similar problem on TCP and the same worked fine when sent as File.
The solution I found is to use the below in props.conf

SHOULD_LINEMERGE=false

now all the events are broken properly ....

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

What is making the TCP connection? Splunk will break at the close of a TCP stream. Individual packets within a connection/stream are not broken, but it seems likely to me that your TCP connection may not be persistent and may in fact be closing and reopening a new one.

Network flakiness is not likely to be the problem (unless, say, it's forcibly terminating connections), as TCP should be able to handle even pretty severe packet loss at the IP level. What is more likely is an odd way that the client is creating or managing the TCP socket/stream/connection.

View solution in original post

sideview
SplunkTrust
SplunkTrust

Right now Im mocking something up with a script that pipes some data to netcat over my home network every few minutes. It'll be fine for a while though, for many cycles and then terrible for a while where a quarter of the events are getting broken. Thanks Gerald.

0 Karma

Genti
Splunk Employee
Splunk Employee

Hey Nick,

"Is it possible to make this problem go away with some key that tells the tcp input to be a little patient and wait a few seconds somehow?"

See if this helps:
http://www.splunk.com/base/Documentation/4.1.5/Admin/Inputsconf
Specifically:

time_before_close = <integer>
* Modtime delta required before Splunk can close a file on EOF.
* Tells the system not to close files that have been updated in past <integer> seconds.
* Defaults to 3.

Cheers!

0 Karma

Genti
Splunk Employee
Splunk Employee

yeap, that is most probably the case...

0 Karma

sideview
SplunkTrust
SplunkTrust

It doesnt seem to have any effect. It seems likely that that key is only valid within monitor:// inputs.

0 Karma

ziegfried
Influencer

The LINE_BREAKER should have a regex capturing group. You could try one of these variants:

LINE_BREAKER=([\v\x00]+)
LINE_BREAKER=(\v+)
LINE_BREAKER=(\x00+)

(\v is a vertical whitespace)

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

timestamp doesn't matter. LINE_BREAKER processing occurs first, then timestamp extraction processing (then line merging, but you don't have SHOULD_LINEMERGE enabled, so this is skipped).

0 Karma

sideview
SplunkTrust
SplunkTrust

not long at all. Only unusual thing is that the timestamp comes at the end, but since the extra breaks can come anywhere in the event text I dont think it's related.

0 Karma

ziegfried
Influencer

Too bad. How long are those events? Can you post examples?

0 Karma

sideview
SplunkTrust
SplunkTrust

Im afraid they dont work. Tried each in between cleans and restarts. The first two still have the sporadic breaking behavior. And the last one causes the 57-line multiline aggregation.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.