<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233298#M45512</link>
    <description>&lt;P&gt;Yes, this worked! The XML is parsed even without stripping the trailing ETX. I had to remove the trailing 'greater than' sign because some of the records have xmlns:xsi and other attributes. I'm wondering why it didn't work for me with &lt;CODE&gt;LINE_BREAKER = \x02&lt;/CODE&gt;. What exactly did that lookahead add?&lt;/P&gt;

&lt;P&gt;Timestamp extraction is my next problem - the events are broken into fields just fine, and, for example, I do find objectdata.general.timestamp field in the resulting event - but timestamp is not extracted properly. I realize that timestamp extraction is done at index time while most fields are extracted at search time, so I'm not sure how to solve that. The problem is that there are a few timestamps in the XML data, and the first one in the most important record type - objectdata - is &lt;STRONG&gt;not&lt;/STRONG&gt; what I want. I'll have to seriously play with timestamp prefix, it seems.&lt;/P&gt;</description>
    <pubDate>Wed, 23 Sep 2015 13:20:14 GMT</pubDate>
    <dc:creator>arkadyz1</dc:creator>
    <dc:date>2015-09-23T13:20:14Z</dc:date>
    <item>
      <title>How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233291#M45505</link>
      <description>&lt;P&gt;Hello,&lt;BR /&gt;
I'm trying to accept TCP input from a device which wraps each transmission into STX/ETX pair (ASCII 002/003), with no line breaks ('\n'). The text inside is XML, which is handled by &lt;CODE&gt;KV_MODE=xml&lt;/CODE&gt; rather nicely - I tried importing a file with line feeds instead of STX/ETX and both that and my &lt;CODE&gt;TIMESTAMP_FIELDS=&lt;/CODE&gt; setting also worked as expected.&lt;BR /&gt;
However, I can't figure out how to break it into events in case of STX/ETX instead of newlines. I tried &lt;CODE&gt;LINE_BREAKER = [\x02\x03]+&lt;/CODE&gt; and &lt;CODE&gt;SHOULD_LINEMERGE=false&lt;/CODE&gt; to no avail. Admittedly, I tested my sourcetype by trying to upload a one-line file with STX/ETX thrown in, to no avail. It still loads as one huge event and cannot parse out the timestamp.&lt;/P&gt;

&lt;P&gt;Am I escaping those hex codes (can switch to octal if necessary) improperly? Is there a known problem parsing LINE_BREAKER? Should I use &lt;CODE&gt;SHOULD_LINEMERGE=true&lt;/CODE&gt; with &lt;CODE&gt;BREAK_ONLY_BEFORE=\x02&lt;/CODE&gt; and &lt;CODE&gt;MUST_BREAK_AFTER=\x03&lt;/CODE&gt; instead? In the latter case, I'll have to strip those control characters some other way before Splunk can parse the XML inside.&lt;/P&gt;

&lt;P&gt;If everything else fails, I can try switching to a scripted input, but that seems an unnecessary hurdle.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 07:19:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233291#M45505</guid>
      <dc:creator>arkadyz1</dc:creator>
      <dc:date>2020-09-29T07:19:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233292#M45506</link>
      <description>&lt;P&gt;You should be able to remove the STX/ETX using SEDCMD and they should be able to use BREAK_ONLY_BEFORE/MUST_BREAK_AFTER. Would you be able to provide some sample events that you might receive?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 07:21:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233292#M45506</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2020-09-29T07:21:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233293#M45507</link>
      <description>&lt;P&gt;Something along the lines of:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;STX&amp;lt;objectdata&amp;gt;&amp;lt;general oid="4"&amp;gt;&amp;lt;timestamp&amp;gt;2015-09-18T02:00:13&amp;lt;/timestamp&amp;gt;&amp;lt;/general&amp;gt;&amp;lt;/objectdata&amp;gt;ETX
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;where STX and ETX are 0x02 and 0x03 respectively. There may be many such XML structures, all surrounded by STX/ETX.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Sep 2015 22:05:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233293#M45507</guid>
      <dc:creator>arkadyz1</dc:creator>
      <dc:date>2015-09-22T22:05:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233294#M45508</link>
      <description>&lt;P&gt;As you can see, the data inside are pure XML. TIMESTAMP_FIELDS will include objectdata.general.timestamp for sure.&lt;/P&gt;

&lt;P&gt;Here is my full index definition as of now:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[tcpInputTest]
SHOULD_LINEMERGE = false
category = Custom
pulldown_type = true
DATETIME_CONFIG = NONE
KV_MODE = xml
disabled = false
TIMESTAMP_FIELDS = objectdata.general.timestamp, tracedata.timestamp, heartbeatdata.timestamp
LINE_BREAKER = \x03?\x02
TRUNCATE = 0
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 22 Sep 2015 22:10:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233294#M45508</guid>
      <dc:creator>arkadyz1</dc:creator>
      <dc:date>2015-09-22T22:10:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233295#M45509</link>
      <description>&lt;P&gt;Give this a try for yoru LINE_BREAKER attribute&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;LINE_BREAKER = (\x02)(?=\&amp;lt;objectdata\&amp;gt;)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 22 Sep 2015 22:14:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233295#M45509</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2015-09-22T22:14:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233296#M45510</link>
      <description>&lt;P&gt;Oh, not every record starts with objectdata tag - some are others. But using just \x02 and stripping \x03 with SEDCMD is what I'm going to try next.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Sep 2015 22:24:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233296#M45510</guid>
      <dc:creator>arkadyz1</dc:creator>
      <dc:date>2015-09-22T22:24:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233297#M45511</link>
      <description>&lt;P&gt;Ok... Try this as well...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;LINE_BREAKER = (\x02)(?=\&amp;lt;\S+\&amp;gt;)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 23 Sep 2015 03:38:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233297#M45511</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2015-09-23T03:38:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233298#M45512</link>
      <description>&lt;P&gt;Yes, this worked! The XML is parsed even without stripping the trailing ETX. I had to remove the trailing 'greater than' sign because some of the records have xmlns:xsi and other attributes. I'm wondering why it didn't work for me with &lt;CODE&gt;LINE_BREAKER = \x02&lt;/CODE&gt;. What exactly did that lookahead add?&lt;/P&gt;

&lt;P&gt;Timestamp extraction is my next problem - the events are broken into fields just fine, and, for example, I do find objectdata.general.timestamp field in the resulting event - but timestamp is not extracted properly. I realize that timestamp extraction is done at index time while most fields are extracted at search time, so I'm not sure how to solve that. The problem is that there are a few timestamps in the XML data, and the first one in the most important record type - objectdata - is &lt;STRONG&gt;not&lt;/STRONG&gt; what I want. I'll have to seriously play with timestamp prefix, it seems.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Sep 2015 13:20:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233298#M45512</guid>
      <dc:creator>arkadyz1</dc:creator>
      <dc:date>2015-09-23T13:20:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233299#M45513</link>
      <description>&lt;P&gt;Well, I'm guessing you didn't have yoru STX enclosed within braces (regular braces), would 've caused it not to work.&lt;/P&gt;

&lt;P&gt;For timestamp recognition, I would suggest you to go traditional and provide attributes like TIME_PREFIX and TIME_FORMAT.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 07:19:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233299#M45513</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2020-09-29T07:19:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233300#M45514</link>
      <description>&lt;P&gt;You need to tell splunk that it is using a diferent line breaker. On your indexer, create a props.conf stanza something like this&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[source::my/source/file.log]
LINE_BREAKER = [\x02\x03]+
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;you may want to replace the source with your sourcetype.&lt;/P&gt;

&lt;P&gt;See &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Data/Indexmulti-lineevents"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Data/Indexmulti-lineevents&lt;/A&gt; for more details.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Sep 2015 14:07:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233300#M45514</guid>
      <dc:creator>bmunson_splunk</dc:creator>
      <dc:date>2015-09-23T14:07:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233301#M45515</link>
      <description>&lt;P&gt;But that's exactly what I've done initially. What I haven't done, however, was enclose that regex in parentheses, as pointed out by samsoni2. Is it documented anywhere?&lt;/P&gt;</description>
      <pubDate>Wed, 23 Sep 2015 14:20:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233301#M45515</guid>
      <dc:creator>arkadyz1</dc:creator>
      <dc:date>2015-09-23T14:20:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233302#M45516</link>
      <description>&lt;P&gt;Oh, nevermind - it is documented, just buried deep enough so that it's easy to miss.&lt;/P&gt;

&lt;P&gt;The docs say this:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;The regex must contain a capturing group -- a pair of parentheses which defines an identified subcomponent of the match&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Something I missed initially. They also explain that the characters matched by LINE_BREAKER are stripped from the resulting events - something that I wanted all along :).&lt;/P&gt;</description>
      <pubDate>Wed, 23 Sep 2015 14:22:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233302#M45516</guid>
      <dc:creator>arkadyz1</dc:creator>
      <dc:date>2015-09-23T14:22:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233303#M45517</link>
      <description>&lt;P&gt;So, to summarize: I was missing a very simple thing - a capturing group (parentheses) around my regex. Here is how it should read:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;LINE_BREAKER = ([\x02\x03]+)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;See those enclosing braces? They were &lt;STRONG&gt;the&lt;/STRONG&gt; reason. The documentation is clear on that (just look for LINE_BREAKER in admin manual and carefully read through the description).&lt;/P&gt;

&lt;P&gt;Special thanks to samsoni2 for pointing it out.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Sep 2015 14:34:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233303#M45517</guid>
      <dc:creator>arkadyz1</dc:creator>
      <dc:date>2015-09-23T14:34:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to split a TCP input on STX/ETX (0x02/0x03), no line breaks, into separate events?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233304#M45518</link>
      <description>&lt;P&gt;For a log file that was separating lines using the hex 0A character, I was able to use LINE_BREAKER = (\x0A).  I viewed the log file in a hex editor to find the line separator.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 16:55:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-split-a-TCP-input-on-STX-ETX-0x02-0x03-no-line-breaks/m-p/233304#M45518</guid>
      <dc:creator>dfrankekcg</dc:creator>
      <dc:date>2016-04-22T16:55:11Z</dc:date>
    </item>
  </channel>
</rss>

