<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Why are events getting split by Splunk parser? in Splunk Enterprise</title>
    <link>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/606665#M13423</link>
    <description>&lt;P&gt;HI,&lt;/P&gt;
&lt;P&gt;We are trying to process and&amp;nbsp; ingest&amp;nbsp; aws s3 events into splunk, but noticed few events are getting split, after checking the configuration we realized this should be caused by splunk internal parsing algorithm.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please let us know if there is any issues in my configuration or could it be something related to splunk parser?&lt;/P&gt;
&lt;P&gt;Below is the entries on props and transform.conf:&lt;/P&gt;
&lt;P&gt;props--&amp;gt;&lt;/P&gt;
&lt;P&gt;[proxy]&lt;BR /&gt;REPORT-proxylogs-fields = proxylogs_fields,extract_url_domain&lt;BR /&gt;LINE_BREAKER = ([\r\n]+)&lt;BR /&gt;# EVENT_BREAKER = ([\r\n]+)&lt;BR /&gt;# EVENT_BREAKER_ENABLE = true&lt;BR /&gt;SHOULD_LINEMERGE = false&lt;BR /&gt;CHARSET = AUTO&lt;BR /&gt;disabled = false&lt;BR /&gt;TRUNCATE = 1000000&lt;BR /&gt;MAX_EVENTS = 1000000&lt;BR /&gt;EVAL-product = "Umbrella"&lt;BR /&gt;EVAL-vendor = "xyz"&lt;BR /&gt;EVAL-vendor_product = "abc"&lt;BR /&gt;MAX_TIMESTAMP_LOOKAHEAD = 22&lt;BR /&gt;NO_BINARY_CHECK = true&lt;BR /&gt;TIME_PREFIX = ^&lt;BR /&gt;TIME_FORMAT = %Y-%m-%d %H:%M:%S&lt;BR /&gt;TZ = UTC&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Transforms.conf --&amp;gt;&lt;/P&gt;
&lt;P&gt;[proxylogs_fields]&lt;BR /&gt;DELIMS = ","&lt;BR /&gt;FIELDS = Timestamp,policy_identities,src,src_translated_ip,dest,content_type,action,url,http_referrer,http_user_agent,status,requestSize,responseSize,responseBodySize,sha256,category,av_detection,pua,amp_disposition,amp_malwarename,amp_score,policy_identity_type,blocked_category,identities,identity_type,request_method,dlp_status,certificate_errors,filename,rulesetID,ruleID,destinationListID,s3_filename&lt;/P&gt;
&lt;P&gt;example of the events:&lt;/P&gt;
&lt;P&gt;"2022-06-27 08:57:14","wer.com","1.1.1.1","1.1.1.1","10.10.10.10","image/gif","ALLOWED","&lt;A href="https://www.moug.net/img/btn_learning.gif" target="_blank" rel="noopener"&gt;https://www.moug.net/img/btn_learning.gif&lt;/A&gt;","&lt;A href="https://www.mikhgg.net/tech/woopr/0025.html" target="_blank" rel="noopener"&gt;https://www.mikhgg.net/tech/woopr/0025.html&lt;/A&gt;","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.124 Safari/537.36 Edg/102.0.1245.44","200","","3571","3328","1a146b09676811234dddccd6dc0ee3cf11aa1803e774df17aa9a49a7370a40ec","Allow List,Fashion","","","","","","AD Users","","wer.com","AD Users,Network Tunnels","GET","ALLOWED","","btn_learning.gif","13347559","346105","15065619",2022-06-27-09-50-ade8.csv.gz&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Events as seen in splunk:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jabezds_0-1658500519516.png" style="width: 400px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/20632i40ABD95D91BA8171/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jabezds_0-1658500519516.png" alt="jabezds_0-1658500519516.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 03 Aug 2022 14:30:00 GMT</pubDate>
    <dc:creator>jabezds</dc:creator>
    <dc:date>2022-08-03T14:30:00Z</dc:date>
    <item>
      <title>Why are events getting split by Splunk parser?</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/606665#M13423</link>
      <description>&lt;P&gt;HI,&lt;/P&gt;
&lt;P&gt;We are trying to process and&amp;nbsp; ingest&amp;nbsp; aws s3 events into splunk, but noticed few events are getting split, after checking the configuration we realized this should be caused by splunk internal parsing algorithm.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please let us know if there is any issues in my configuration or could it be something related to splunk parser?&lt;/P&gt;
&lt;P&gt;Below is the entries on props and transform.conf:&lt;/P&gt;
&lt;P&gt;props--&amp;gt;&lt;/P&gt;
&lt;P&gt;[proxy]&lt;BR /&gt;REPORT-proxylogs-fields = proxylogs_fields,extract_url_domain&lt;BR /&gt;LINE_BREAKER = ([\r\n]+)&lt;BR /&gt;# EVENT_BREAKER = ([\r\n]+)&lt;BR /&gt;# EVENT_BREAKER_ENABLE = true&lt;BR /&gt;SHOULD_LINEMERGE = false&lt;BR /&gt;CHARSET = AUTO&lt;BR /&gt;disabled = false&lt;BR /&gt;TRUNCATE = 1000000&lt;BR /&gt;MAX_EVENTS = 1000000&lt;BR /&gt;EVAL-product = "Umbrella"&lt;BR /&gt;EVAL-vendor = "xyz"&lt;BR /&gt;EVAL-vendor_product = "abc"&lt;BR /&gt;MAX_TIMESTAMP_LOOKAHEAD = 22&lt;BR /&gt;NO_BINARY_CHECK = true&lt;BR /&gt;TIME_PREFIX = ^&lt;BR /&gt;TIME_FORMAT = %Y-%m-%d %H:%M:%S&lt;BR /&gt;TZ = UTC&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Transforms.conf --&amp;gt;&lt;/P&gt;
&lt;P&gt;[proxylogs_fields]&lt;BR /&gt;DELIMS = ","&lt;BR /&gt;FIELDS = Timestamp,policy_identities,src,src_translated_ip,dest,content_type,action,url,http_referrer,http_user_agent,status,requestSize,responseSize,responseBodySize,sha256,category,av_detection,pua,amp_disposition,amp_malwarename,amp_score,policy_identity_type,blocked_category,identities,identity_type,request_method,dlp_status,certificate_errors,filename,rulesetID,ruleID,destinationListID,s3_filename&lt;/P&gt;
&lt;P&gt;example of the events:&lt;/P&gt;
&lt;P&gt;"2022-06-27 08:57:14","wer.com","1.1.1.1","1.1.1.1","10.10.10.10","image/gif","ALLOWED","&lt;A href="https://www.moug.net/img/btn_learning.gif" target="_blank" rel="noopener"&gt;https://www.moug.net/img/btn_learning.gif&lt;/A&gt;","&lt;A href="https://www.mikhgg.net/tech/woopr/0025.html" target="_blank" rel="noopener"&gt;https://www.mikhgg.net/tech/woopr/0025.html&lt;/A&gt;","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.124 Safari/537.36 Edg/102.0.1245.44","200","","3571","3328","1a146b09676811234dddccd6dc0ee3cf11aa1803e774df17aa9a49a7370a40ec","Allow List,Fashion","","","","","","AD Users","","wer.com","AD Users,Network Tunnels","GET","ALLOWED","","btn_learning.gif","13347559","346105","15065619",2022-06-27-09-50-ade8.csv.gz&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Events as seen in splunk:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jabezds_0-1658500519516.png" style="width: 400px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/20632i40ABD95D91BA8171/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jabezds_0-1658500519516.png" alt="jabezds_0-1658500519516.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2022 14:30:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/606665#M13423</guid>
      <dc:creator>jabezds</dc:creator>
      <dc:date>2022-08-03T14:30:00Z</dc:date>
    </item>
    <item>
      <title>Re: Events are getting split by splunk parser</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/606668#M13424</link>
      <description>&lt;P&gt;It looks like line breaks need to be isolated to newlines that precede a date.&amp;nbsp; Try &lt;FONT face="courier new,courier"&gt;LINE_BREAKER = ([\r\n]+)\d{4}-\d\d&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2022 14:56:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/606668#M13424</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2022-07-22T14:56:00Z</dc:date>
    </item>
    <item>
      <title>Re: Why are Events getting split by splunk parser?</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/608054#M13520</link>
      <description>&lt;P&gt;Thanks &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/213957"&gt;@richgalloway&lt;/a&gt;&amp;nbsp;We tried this expression, btw I had to tweak the expression to&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;([\r\n]+)\"\d{4}-\d\d , to accept the double quotes before the year, but im still facing the same issue. and the event is splitting exactly as the previous.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Is there any other parameters i'm missing in props.conf?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2022 10:16:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/608054#M13520</guid>
      <dc:creator>jabezds</dc:creator>
      <dc:date>2022-08-03T10:16:58Z</dc:date>
    </item>
    <item>
      <title>Re: Why are events getting split by Splunk parser?</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/608346#M13542</link>
      <description>&lt;P&gt;Where are these props and transforms loading from? I don't see that sourcetype in Splunk Enterprise, any of your apps, or the&amp;nbsp;&lt;A href="https://splunkbase.splunk.com/app/1876/" target="_self"&gt;Splunk Add-on for Amazon Web Services (AWS)&lt;/A&gt;&amp;nbsp;? I was looking to play with it and I don't see them.&lt;/P&gt;&lt;P&gt;The sourcetype here doesn't match the sourcetype in the screenshot. Here's it's "proxy" but in the screenshot there appears to be more to the name. I'm guessing you changed the name here in text to hide the blocked out part from the screenshot. Nonetheless it's worth highlighting to make sure the names match the data being viewed.&lt;/P&gt;&lt;P&gt;This is interesting because the config appears to be for splitting just on newlines but it's splitting in the middle of that text. Is there a hidden newline in that text? The truncate value seems high enough to not be at play.&lt;/P&gt;&lt;P&gt;What does `&lt;SPAN&gt;&lt;FONT face="courier new,courier"&gt;extract_url_domain&lt;/FONT&gt;` do? What's the config for that?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Have you used &lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/Usebtooltotroubleshootconfigurations" target="_self"&gt;btool&lt;/A&gt; or the &lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/Data/Managesourcetypes" target="_self"&gt;UI&lt;/A&gt; to ensure the fields are properly loading? Sometimes it's easier to overlook another config file that is overriding your desired settings.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2022 20:27:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/608346#M13542</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2022-08-04T20:27:45Z</dc:date>
    </item>
    <item>
      <title>Re: Why are Events getting split by splunk parser?</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/608347#M13543</link>
      <description>&lt;P&gt;The config all looks correct. Just to be safe, make sure you restart splunk if you make changes to the config. It's possible that your changes were added to the conf file but Splunk didn't load them because it wasn't prompted to (which can happen with a restart, /debug/refresh, or "extract reload=t").&lt;/P&gt;&lt;P&gt;Also, it might help to see if other's have the same issue or if this happens on a clean install of Splunk.&lt;/P&gt;&lt;P&gt;Finally, if you have customer support then this type of basic sourcetype functionality could be something they may be able to help with.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2022 20:35:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/608347#M13543</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2022-08-04T20:35:53Z</dc:date>
    </item>
    <item>
      <title>Re: Why are events getting split by Splunk parser?</title>
      <link>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/608366#M13545</link>
      <description>&lt;P&gt;Noticed one issue with your TIME_FORMAT. Looks like&amp;nbsp; your&lt;/P&gt;&lt;P&gt;You have:&lt;/P&gt;&lt;P&gt;TIME_FORMAT = %Y-%m-%d %H:%M:%S&lt;/P&gt;&lt;P&gt;Should be:&lt;/P&gt;&lt;P&gt;TIME_FORMAT = "%Y-%m-%d %H:%M:%S"&lt;/P&gt;&lt;P&gt;For the parsing issue, I have seen issues like this when we had a sourcetype from a different app with the same sourcetype name. I would run the btool to double check that is not the issue.&lt;/P&gt;&lt;P&gt;Another thing you could try is breaking on the gz at the end of the log. That is assuming that value is in every event&lt;/P&gt;&lt;P&gt;LINE_BREAKER = .csv.gz&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2022 02:14:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Enterprise/Why-are-events-getting-split-by-Splunk-parser/m-p/608366#M13545</guid>
      <dc:creator>matt8679</dc:creator>
      <dc:date>2022-08-05T02:14:08Z</dc:date>
    </item>
  </channel>
</rss>

