<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Understanding LINE_BREAKER regexes in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44263#M8263</link>
    <description>&lt;P&gt;Did you sent &lt;CODE&gt;SHOULD_LINEMERGE = false&lt;/CODE&gt;?  This should work:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;LINE_BREAKER = ([\r\n]+)y\s+z
SHOULD_LINEMERGE = false
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Sat, 12 Jan 2019 00:48:41 GMT</pubDate>
    <dc:creator>woodcock</dc:creator>
    <dc:date>2019-01-12T00:48:41Z</dc:date>
    <item>
      <title>Understanding LINE_BREAKER regexes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44260#M8260</link>
      <description>&lt;P&gt;I'm trying to wrap my head around LINE_BREAKER regexes, especially WRT whitespace handling and wildcard matching.&lt;/P&gt;

&lt;P&gt;Given a file containing:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;y z
xx1
xx2
y z
xx3
xx4
y y
xx5
xx6
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And applying either of:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;LINE_BREAKER = ([\r\n]+)(?:y\s+z)

LINE_BREAKER = ([\r\n]+)(?:y.*?z)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Splunk will make a new event at "y y", even though I don't want it to. In other words,&lt;/P&gt;

&lt;P&gt;I expect:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;y z
xx1
xx2

y z
xx3
xx4
y y
xx5
xx6
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;But splunk actually produces:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;y z
xx1
xx2

y z
xx3
xx4

y y
xx5
xx6
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Presumably it's matching the "y\s+" / "y.*?" and deciding to break on that line. What am I missing? How can I get it to recognize the "z" in the regex?&lt;/P&gt;</description>
      <pubDate>Sun, 17 Jul 2011 16:54:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44260#M8260</guid>
      <dc:creator>stevesq</dc:creator>
      <dc:date>2011-07-17T16:54:01Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding LINE_BREAKER regexes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44261#M8261</link>
      <description>&lt;P&gt;This appears to be one of those elusive cases where LINE_BREAKER fails where BREAK_ONLY_BEFORE succeeds...&lt;/P&gt;

&lt;P&gt;I was able to reproduce the problem you report from your test data with LINE_BREAKER. However. using :&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;BREAK_ONLY_BEFORE = y\s+z&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;...I get the expected results. Give BREAK_ONLY_BEFORE a try and let us know if it works. Remember to remove the &lt;CODE&gt;([\r\n]+)&lt;/CODE&gt; capture group as BREAK_ONLY_BEFORE doesn't need it.&lt;/P&gt;

&lt;P&gt;From &lt;A href="http://www.splunk.com/base/Documentation/latest/Admin/Propsconf" target="_blank"&gt;props.conf.spec&lt;/A&gt; :&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;&lt;/CODE&gt;&lt;PRE&gt;&lt;CODE&gt;BREAK_ONLY_BEFORE = &lt;REGULAR expression=""&gt;&lt;BR /&gt;
 * When set, Splunk creates a new event only if it encounters a new line that matches the&lt;BR /&gt;
  regular expression.&lt;BR /&gt;
 * Defaults to empty.&lt;/REGULAR&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;I have opened a bug (SPL-41430) to have our developers take a look at this issue.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;UPDATE :&lt;/STRONG&gt; As Masa stated, if you are using &lt;CODE&gt;LINE_BREAKER&lt;/CODE&gt;, you &lt;STRONG&gt;must&lt;/STRONG&gt; use &lt;CODE&gt;SHOULD_LINEMERGE = false&lt;/CODE&gt;. The test file is properly line-broken with the following configuration :&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;&lt;/CODE&gt;&lt;PRE&gt;&lt;CODE&gt;&lt;BR /&gt;
LINE_BREAKER = ([\r\n]+)y\s+z&lt;BR /&gt;
SHOULD_LINEMERGE = false&lt;BR /&gt;
&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:44:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44261#M8261</guid>
      <dc:creator>hexx</dc:creator>
      <dc:date>2020-09-28T09:44:59Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding LINE_BREAKER regexes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44262#M8262</link>
      <description>&lt;P&gt;I would do the same solution as hexx suggested in general.&lt;BR /&gt;&lt;BR /&gt;
( I could not add the comment. So, I'm using another answer field.)&lt;BR /&gt;
&lt;PRE&gt;&lt;BR /&gt;
Additional Info:&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;Splunk processes a stream of data as follows;&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;&lt;P&gt;Break the stream into single line&lt;BR /&gt;
LINE_BREAKER will be used here.&lt;BR /&gt;
( At this point, Splunk does not know if event is a single line or not)&lt;/P&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;Check if need to merge multiple lines as one event&lt;BR /&gt;
SHOULD_LINEMERGE, BREAK_ONLY_BEFORE, etc work here&lt;BR /&gt;
( At this point, Splunk recognizes each event as either multi-line or single line)&lt;/P&gt;&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;I think it's possible that the issue was at the line merge time in your case.&lt;BR /&gt;
 Also, the "lookahead (?=)" regex would be more appropriate than "No backreference (?:)" in this case.&lt;/P&gt;

&lt;P&gt;So, there is an alternative solution;&lt;/P&gt;

&lt;P&gt;LINE_BREAKER = ([\r\n]+)(?=y\s+z)&lt;BR /&gt;
SHOULD_LINEMERGE = false &lt;BR /&gt;
LEARN_MODEL = false&lt;/P&gt;

&lt;P&gt;I did a quick test with this, and it worked for me. &lt;/P&gt;

&lt;P&gt;If this does not work, possibly there is props.conf in learned app generated configuration for this event. &lt;BR /&gt;
In that case, delete the part in $SPLUNK_HOME/etc/apps/learned/local/props.conf.&lt;BR /&gt;
&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:45:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44262#M8262</guid>
      <dc:creator>Masa</dc:creator>
      <dc:date>2020-09-28T09:45:04Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding LINE_BREAKER regexes</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44263#M8263</link>
      <description>&lt;P&gt;Did you sent &lt;CODE&gt;SHOULD_LINEMERGE = false&lt;/CODE&gt;?  This should work:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;LINE_BREAKER = ([\r\n]+)y\s+z
SHOULD_LINEMERGE = false
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 12 Jan 2019 00:48:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Understanding-LINE-BREAKER-regexes/m-p/44263#M8263</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-01-12T00:48:41Z</dc:date>
    </item>
  </channel>
</rss>

