Getting Data In
Highlighted

How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

Builder

Hello,
I'm trying to accept TCP input from a device which wraps each transmission into STX/ETX pair (ASCII 002/003), with no line breaks ('\n'). The text inside is XML, which is handled by KV_MODE=xml rather nicely - I tried importing a file with line feeds instead of STX/ETX and both that and my TIMESTAMP_FIELDS= setting also worked as expected.
However, I can't figure out how to break it into events in case of STX/ETX instead of newlines. I tried LINE_BREAKER = [\x02\x03]+ and SHOULD_LINEMERGE=false to no avail. Admittedly, I tested my sourcetype by trying to upload a one-line file with STX/ETX thrown in, to no avail. It still loads as one huge event and cannot parse out the timestamp.

Am I escaping those hex codes (can switch to octal if necessary) improperly? Is there a known problem parsing LINEBREAKER? Should I use `SHOULDLINEMERGE=truewithBREAKONLYBEFORE=\x02andMUSTBREAKAFTER=\x03` instead? In the latter case, I'll have to strip those control characters some other way before Splunk can parse the XML inside.

If everything else fails, I can try switching to a scripted input, but that seems an unnecessary hurdle.

Tags (2)
0 Karma
Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

SplunkTrust
SplunkTrust

You should be able to remove the STX/ETX using SEDCMD and they should be able to use BREAKONLYBEFORE/MUSTBREAKAFTER. Would you be able to provide some sample events that you might receive?

0 Karma
Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

Builder

Something along the lines of:

STX<objectdata><general oid="4"><timestamp>2015-09-18T02:00:13</timestamp></general></objectdata>ETX

where STX and ETX are 0x02 and 0x03 respectively. There may be many such XML structures, all surrounded by STX/ETX.

0 Karma
Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

Builder

As you can see, the data inside are pure XML. TIMESTAMP_FIELDS will include objectdata.general.timestamp for sure.

Here is my full index definition as of now:

[tcpInputTest]
SHOULD_LINEMERGE = false
category = Custom
pulldown_type = true
DATETIME_CONFIG = NONE
KV_MODE = xml
disabled = false
TIMESTAMP_FIELDS = objectdata.general.timestamp, tracedata.timestamp, heartbeatdata.timestamp
LINE_BREAKER = \x03?\x02
TRUNCATE = 0
0 Karma
Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

SplunkTrust
SplunkTrust

Give this a try for yoru LINE_BREAKER attribute

LINE_BREAKER = (\x02)(?=\<objectdata\>)
Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

Builder

Oh, not every record starts with objectdata tag - some are others. But using just \x02 and stripping \x03 with SEDCMD is what I'm going to try next.

0 Karma
Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

SplunkTrust
SplunkTrust

Ok... Try this as well...

LINE_BREAKER = (\x02)(?=\<\S+\>)
Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

Builder

Yes, this worked! The XML is parsed even without stripping the trailing ETX. I had to remove the trailing 'greater than' sign because some of the records have xmlns:xsi and other attributes. I'm wondering why it didn't work for me with LINE_BREAKER = \x02. What exactly did that lookahead add?

Timestamp extraction is my next problem - the events are broken into fields just fine, and, for example, I do find objectdata.general.timestamp field in the resulting event - but timestamp is not extracted properly. I realize that timestamp extraction is done at index time while most fields are extracted at search time, so I'm not sure how to solve that. The problem is that there are a few timestamps in the XML data, and the first one in the most important record type - objectdata - is not what I want. I'll have to seriously play with timestamp prefix, it seems.

0 Karma
Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

SplunkTrust
SplunkTrust

Well, I'm guessing you didn't have yoru STX enclosed within braces (regular braces), would 've caused it not to work.

For timestamp recognition, I would suggest you to go traditional and provide attributes like TIMEPREFIX and TIMEFORMAT.

Highlighted

Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?

Splunk Employee
Splunk Employee

You need to tell splunk that it is using a diferent line breaker. On your indexer, create a props.conf stanza something like this

[source::my/source/file.log]
LINE_BREAKER = [\x02\x03]+

you may want to replace the source with your sourcetype.

See http://docs.splunk.com/Documentation/Splunk/latest/Data/Indexmulti-lineevents for more details.

0 Karma