<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501745#M146632</link>
    <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;Using LINE_BREAKER= and SHOULD_LINEMERGE=false will always be WAAAAAAAY faster than using SHOULD_LINEMERGE=true. Obviously the better the RegEx in your LINE_BREAKER, the more efficient event processing will be so always spend extra time optimizing your LINE_BREAKER.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/1406"&gt;@woodcock&lt;/a&gt; says.&lt;/P&gt;</description>
    <pubDate>Wed, 30 Sep 2020 03:59:16 GMT</pubDate>
    <dc:creator>to4kawa</dc:creator>
    <dc:date>2020-09-30T03:59:16Z</dc:date>
    <item>
      <title>BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working</title>
      <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501742#M146629</link>
      <description>&lt;P&gt;A custom web application produces logs in the tomcat format like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;2020-01-31 18:19:02,091 DEBUG [com.vendor.make.services.ServiceName] (pool-7-thread-44) - &amp;lt;Short Form: time elapsed 120, pause interval 360, workflows to start 0&amp;gt;
Potentially a super long message from one to 400 lines, &amp;lt; 50K characters, often JSON
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The events &lt;STRONG&gt;always&lt;/STRONG&gt; begin with a newline and a timestamp, &lt;STRONG&gt;always&lt;/STRONG&gt; in the same format (above).&lt;/P&gt;

&lt;P&gt;... yet Splunk breaks up long events (I've seen events with 3K characters broken up, and more) - and so far it looks like they are all JSONs being logged.:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;1/31/20
6:21:02.419 PM  
    "preroll_start-eVar32" : "live"
  },
  "feed:relateds" : [ ]
}, {
  "id" : asset_id,
Show all 9 lines
host = hostname source = /custom_app/tomcat/logs/custom_app.log sourcetype = tomcat:custom_app
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;It also seems to do so consistently, and always in the same place regardless of the length of the event, right between these two lines:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"preroll_start-eVar29" : "feed_app|sec_us|||asset_id|video|200131_feedl_headlines_3pm_video",
"preroll_start-eVar32" : "live"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Any idea what trips it, or what can be done to force Splunk to keep these events together?&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;

&lt;P&gt;Incidentals:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;TRUNCATE = 0&lt;/CODE&gt; in the &lt;CODE&gt;props.conf&lt;/CODE&gt; (+ &lt;CODE&gt;splunk apply cluster-bundle&lt;/CODE&gt;) on the CM seem to make no difference.&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;/opt/splunk/etc/master-apps/_cluster/local/props.conf&lt;/CODE&gt; on the CM:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[tomcat:custom_app]
TRUNCATE = 0
EXTRACT-.... = ....
SEDCMD-scrub_passwords = s/STRING1_PASS=([^\s]+)/STRING1_PASS=#####/g
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 01 Feb 2020 02:59:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501742#M146629</guid>
      <dc:creator>mitag</dc:creator>
      <dc:date>2020-02-01T02:59:44Z</dc:date>
    </item>
    <item>
      <title>Re: BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working</title>
      <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501743#M146630</link>
      <description>&lt;P&gt;As you said that every event begins with a new line, you might only set SHOULD_LINEMERGE = false (and use the default LINE_BREAKER which is per default defined to break an event after every line or if you also have newlines without being a new event, you need to define the LINE_BREAKER for this).&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 04:01:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501743#M146630</guid>
      <dc:creator>jbrocks</dc:creator>
      <dc:date>2020-09-30T04:01:09Z</dc:date>
    </item>
    <item>
      <title>Re: BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working</title>
      <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501744#M146631</link>
      <description>&lt;P&gt;event begins with a newline != there are no newlines within an event.&lt;/P&gt;

&lt;P&gt;There &lt;STRONG&gt;are&lt;/STRONG&gt; newlines in the event - lots of them - most JSONs have them. So following your suggestion would make it much worse...&lt;/P&gt;</description>
      <pubDate>Sun, 02 Feb 2020 02:32:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501744#M146631</guid>
      <dc:creator>mitag</dc:creator>
      <dc:date>2020-02-02T02:32:08Z</dc:date>
    </item>
    <item>
      <title>Re: BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working</title>
      <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501745#M146632</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;Using LINE_BREAKER= and SHOULD_LINEMERGE=false will always be WAAAAAAAY faster than using SHOULD_LINEMERGE=true. Obviously the better the RegEx in your LINE_BREAKER, the more efficient event processing will be so always spend extra time optimizing your LINE_BREAKER.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/1406"&gt;@woodcock&lt;/a&gt; says.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 03:59:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501745#M146632</guid>
      <dc:creator>to4kawa</dc:creator>
      <dc:date>2020-09-30T03:59:16Z</dc:date>
    </item>
    <item>
      <title>Re: BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working</title>
      <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501746#M146633</link>
      <description>&lt;P&gt;Hi mitag,&lt;/P&gt;

&lt;P&gt;Try below Syntax in your props.conf under sourcetype.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; [Sourcetype] 
BREAK_ONLY_BEFORE = ^\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d,\d\d\d 
NO_BINARY_CHECK = 1 
SHOULD_LINEMERGE = true 
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3Q
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The regex in above syntax is matching with your event format.&lt;BR /&gt;
PFA screenshot for your ref.&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/8319i38DF11D7654F8157/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Feb 2020 09:20:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501746#M146633</guid>
      <dc:creator>abhijeet01</dc:creator>
      <dc:date>2020-02-02T09:20:33Z</dc:date>
    </item>
    <item>
      <title>Re: BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working</title>
      <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501747#M146634</link>
      <description>&lt;P&gt;It would probably work - but:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;won't scale / port if the app is ported to non-US time notation standards - this will break then, and I don't want to write up REGEXes for all possible scenarios - not my job.&lt;/LI&gt;
&lt;LI&gt;doesn't answer a part of my question: why does Splunk do what it does? What trips it to break the event between those specific lines regardless of the number of lines or characters in the event?&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;W/o answering that last one (expected behavior? a bug? just my Splunk version or all of them? Is this documented anywhere? Etc.) - we're left with unpredictable behavior that can potentially break things.&lt;/P&gt;

&lt;P&gt;Bottom line is this perhaps:&lt;/P&gt;

&lt;P&gt;What is the sure-fire way to force Splunk to &lt;STRONG&gt;only&lt;/STRONG&gt; break a long event on a Splunk-compliant timestamp? (If it's just &lt;CODE&gt;TRUNCATE = 0&lt;/CODE&gt; and the default setting of &lt;CODE&gt;SHOULD_LINEMERGE = true&lt;/CODE&gt; - then it's &lt;STRONG&gt;not&lt;/STRONG&gt; working in this case and I need help figuring out why.)&lt;/P&gt;

&lt;P&gt;(Related to that: what is the way to search for improperly broken events across all datasets - events Splunk broke up into multiples whether intended or not? E.g. is there a flag, a field or tag assigned to these events?)&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2020 19:47:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501747#M146634</guid>
      <dc:creator>mitag</dc:creator>
      <dc:date>2020-02-03T19:47:10Z</dc:date>
    </item>
    <item>
      <title>Re: BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working</title>
      <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501748#M146635</link>
      <description>&lt;P&gt;Cool - didn't realize &lt;CODE&gt;LINE_BREAKER=&lt;/CODE&gt; takes precedence to &lt;CODE&gt;SHOULD_LINEMERGE=false&lt;/CODE&gt;. (But doesn't that mean that when &lt;CODE&gt;LINE_BREAKER&lt;/CODE&gt; is defined, Splunk won't even look at the &lt;CODE&gt;SHOULD_LINEMERGE&lt;/CODE&gt; setting? I.e. it'd only break events on &lt;CODE&gt;LINE_BREAKER&lt;/CODE&gt; REGEX matches regardless of whether there are newlines?  ....Unless I missing something...)&lt;/P&gt;

&lt;P&gt;That said, my goal is to ask Splunk to &lt;STRONG&gt;only&lt;/STRONG&gt; break on what &lt;STRONG&gt;Splunk&lt;/STRONG&gt; thinks are valid timestamps, not to write up all possible timestamp REGEXes myself - for portability and maintenance reasons.&lt;/P&gt;

&lt;P&gt;So I still need to figure this out.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2020 22:00:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501748#M146635</guid>
      <dc:creator>mitag</dc:creator>
      <dc:date>2020-02-03T22:00:58Z</dc:date>
    </item>
    <item>
      <title>Re: BREAK_ONLY_BEFORE_DATE=TRUE seems to not be working</title>
      <link>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501749#M146636</link>
      <description>&lt;P&gt;Short answer:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;MAX_EVENTS=10000
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;... in the appopriate sourcetype stanza in props.conf.&lt;/P&gt;

&lt;P&gt;Long answer:&lt;/P&gt;

&lt;P&gt;"Line breaking issues" section in "&lt;A href="https://docs.splunk.com/Documentation/Splunk/8.0.1/Data/Resolvedataqualityissues"&gt;Resolve data quality issues&lt;/A&gt;" Splunk KB article pointed in the right direction:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;MAX_EVENTS defines the maximum number of lines in an event.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;I.e. if there is a chance the number of lines in an event if &amp;gt; 256 for a specific sourcetype, set &lt;CODE&gt;MAX_EVENTS&lt;/CODE&gt; to what the maximum should be. In my case the super-long JSONs could run thousands of lines, and I set it to 10000.&lt;/P&gt;

&lt;P&gt;Clues:&lt;/P&gt;

&lt;P&gt;"Show all 257 lines" in an event, like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;2020-02-08 23:25:02,439 TRACE [com.vendor.services.ingest.ExternalIngestService] (pool-7-thread-28) - &amp;lt;JSON Feed:
{
  "lastBuildDate" : "2020-02-09 05:13:34.034+0000",
  "****:analytics" : [ {
    "****:analytics-pageName" : "alertsindex|auto-play",
Show all 257 lines
sourcetype = tomcat:custom_app
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This means the event is around 256 lines long - and given the default limit of 256 lines per event, this should raise suspicions.&lt;/P&gt;

&lt;P&gt;Another clue is mentioned in "&lt;A href="https://docs.splunk.com/Documentation/Splunk/8.0.1/Data/Resolvedataqualityissues"&gt;Resolve data quality issues&lt;/A&gt;", to look for "MAX_EVENTS (256) was exceeded without a single event break" warnings in splunkd.log, e.g.:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;12-07-2016 09:32:32.876 -0500 WARN  AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (256) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only.
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;P.S. To me, &lt;CODE&gt;MAX_EVENTS&lt;/CODE&gt; is confusing: &lt;CODE&gt;MAX_LINES&lt;/CODE&gt; would have been much easier to digest.&lt;/P&gt;</description>
      <pubDate>Sun, 09 Feb 2020 08:19:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/BREAK-ONLY-BEFORE-DATE-TRUE-seems-to-not-be-working/m-p/501749#M146636</guid>
      <dc:creator>mitag</dc:creator>
      <dc:date>2020-02-09T08:19:47Z</dc:date>
    </item>
  </channel>
</rss>

