<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to configure props.conf to filter out events with CTRL-M (^M)? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123516#M25495</link>
    <description>&lt;P&gt;Here sis the stanza defined in props.conf on the indexer&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[app_log]
TZ = 'America/New_York'
NO_BINARY_CHECK = 1
pulldown_type = 1
BREAK_ONLY_BEFORE = \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}
TIME_FORMAT = %Y-%m-%d %H:%M:%S
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Thu, 20 Nov 2014 19:56:43 GMT</pubDate>
    <dc:creator>shangshin</dc:creator>
    <dc:date>2014-11-20T19:56:43Z</dc:date>
    <item>
      <title>How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123507#M25486</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;
It seems log file contains CTRL-M character will cause duplicate parsing in splunk indexer so I would like to filter all events contain this character. Please advise on how to set it up in props.conf&lt;/P&gt;

&lt;P&gt;Thanks in advance!&lt;/P&gt;

&lt;P&gt;e.g.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;?xml version="1.0" encoding="ISO-8859-1"?&amp;gt;^M
&amp;lt;DATA&amp;gt;^M
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 18 Nov 2014 18:03:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123507#M25486</guid>
      <dc:creator>shangshin</dc:creator>
      <dc:date>2014-11-18T18:03:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123508#M25487</link>
      <description>&lt;P&gt;As an aside, I usually see the ^M character show up in files written by Windows systems and transferred to Unix. I get rid of those characters using a dos2unix command (i.e. dos2unix ctrlmfile newfile). Is it possible to do that before you index the file?&lt;/P&gt;

&lt;P&gt;If not, if you are planning to discard the events with ^M characters totally, then you'll need to employ a props.conf/transforms.conf config change to route these items to nullQueue. See the following answer which may help guide you.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://answers.splunk.com/answers/108326/regex-and-nullqueue-problem.html"&gt;http://answers.splunk.com/answers/108326/regex-and-nullqueue-problem.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2014 18:21:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123508#M25487</guid>
      <dc:creator>jeremiahc4</dc:creator>
      <dc:date>2014-11-18T18:21:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123509#M25488</link>
      <description>&lt;P&gt;The Carriage Return character (^M) does not cause duplicate indexing with Splunk, so you likely have some other problem.&lt;/P&gt;

&lt;P&gt;The way splunk handles lines is to split lines on any sequence of carriage returns or linefeeds (also called a newline).  These characters are sometimes written ^M and ^L, or  CR and LF  or NL .  Splunk doesn't care which you have in the file, it will just linebreak any any quantity of sequential characters of either type.&lt;/P&gt;

&lt;P&gt;Therefore, unless you have a custom LINEBREAKER setting, these characters are gone by the time we get to event merging and so on.&lt;/P&gt;

&lt;P&gt;Meanwhile, the strategy used by the tailing processor which reads logfiles doesn't care about the particular bytes you have in your files.  It just reads chunks of bytes and hashes the start and end of the file.  Most likely, since your file is xml, the end of the file is being rewritten by replacing the close tag with event text.&lt;/P&gt;

&lt;P&gt;In other words your file is probably going from:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;...
&amp;lt;event10&amp;gt;event text&amp;lt;/event10&amp;gt;
&amp;lt;/data&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;to &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;...
&amp;lt;event10&amp;gt;event text&amp;lt;/event10&amp;gt;
&amp;lt;event11&amp;gt;event text&amp;lt;/event11&amp;gt;
&amp;lt;/data&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Since the bytes for the data closetag get replaced with the bytes for event eleven, the hash for the file end changes and the file is considered to contain new content.&lt;/P&gt;

&lt;P&gt;Workarounds involve monitoring the file after it is complete, or modifying the application to not write out a close tag until the logfile is complete and will no longer be written to.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2014 18:39:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123509#M25488</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2014-11-18T18:39:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123510#M25489</link>
      <description>&lt;P&gt;Indeed, tossing the "header" lines is a reasonable thing to do.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2014 18:40:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123510#M25489</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2014-11-18T18:40:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123511#M25490</link>
      <description>&lt;P&gt;Heh, I wasn't even paying attention to what the example data was. Yeah, you probably don't want to remove that line totally. A SEDCMD regex could be used to zap the ^M characters without burning the whole line from the event. But, your other answer below looks like a good route to explore first.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2014 18:48:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123511#M25490</guid>
      <dc:creator>jeremiahc4</dc:creator>
      <dc:date>2014-11-18T18:48:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123512#M25491</link>
      <description>&lt;P&gt;The main thrust is that the ^M characters are probably gone before you can try to  zap them.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2014 18:52:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123512#M25491</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2014-11-18T18:52:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123513#M25492</link>
      <description>&lt;P&gt;Thank you very much for the detail explanation.&lt;/P&gt;

&lt;P&gt;I agree with your point  -- carriage Return character (^M) does not cause duplicate indexing&lt;/P&gt;

&lt;P&gt;Duplicate event index seems to be related to something else.&lt;/P&gt;

&lt;P&gt;I found the error on forwarder's splunkd.log &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;11-13-2014 10:51:01.063 -0500 INFO  WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/local/0/lns/home/prod/log/jobs.log'.
11-13-2014 10:51:01.063 -0500 INFO  WatchedFile - Will begin reading at offset=0 for file='/local/0/lns/home/prod/log/jobs.log'.
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;So I tried to modify inputs.conf as below but the issue of duplicate events still persist. Do you have any insight? &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///local/0/lns/home/prod/log/jobs.log]
host = nylnslxprd01
sourcetype = app_log
index = rsch_app
crcSalt=
followTail = 1
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 18 Nov 2014 21:58:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123513#M25492</guid>
      <dc:creator>shangshin</dc:creator>
      <dc:date>2014-11-18T21:58:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123514#M25493</link>
      <description>&lt;P&gt;It sure looks like the problem I expected is happening.  "seekptr didn't match" means one of two things:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt; you have multiple files with the same exact initial 256 bytes.&lt;/LI&gt;
&lt;LI&gt; the file bytes are being changed after they are written.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;Given that it's an XML file, 2 is by far the most probable as described above.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2014 23:38:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123514#M25493</guid>
      <dc:creator>jrodman</dc:creator>
      <dc:date>2014-11-18T23:38:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123515#M25494</link>
      <description>&lt;P&gt;Thanks again for the information. I think it's number 2 -- the file bytes are being changed after they are written.&lt;/P&gt;

&lt;P&gt;The log file is updated frequently roughly 20 events per second.&lt;BR /&gt;
Is that the root cause? &lt;/P&gt;

&lt;P&gt;If the answer is yes, any remedy?&lt;/P&gt;

&lt;P&gt;2014-11-20 11:44:59,029 &lt;BR /&gt;
2014-11-20 11:44:59,065 &lt;BR /&gt;
2014-11-20 11:44:59,070 &lt;BR /&gt;
2014-11-20 11:44:59,071 &lt;BR /&gt;
2014-11-20 11:44:59,377 &lt;BR /&gt;
2014-11-20 11:44:59,396 &lt;BR /&gt;
2014-11-20 11:44:59,396 &lt;BR /&gt;
2014-11-20 11:44:59,543 &lt;BR /&gt;
2014-11-20 11:44:59,573 &lt;BR /&gt;
2014-11-20 11:44:59,578 &lt;BR /&gt;
2014-11-20 11:44:59,578 &lt;BR /&gt;
2014-11-20 11:44:59,886 &lt;/P&gt;</description>
      <pubDate>Thu, 20 Nov 2014 19:54:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123515#M25494</guid>
      <dc:creator>shangshin</dc:creator>
      <dc:date>2014-11-20T19:54:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123516#M25495</link>
      <description>&lt;P&gt;Here sis the stanza defined in props.conf on the indexer&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[app_log]
TZ = 'America/New_York'
NO_BINARY_CHECK = 1
pulldown_type = 1
BREAK_ONLY_BEFORE = \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}
TIME_FORMAT = %Y-%m-%d %H:%M:%S
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 20 Nov 2014 19:56:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123516#M25495</guid>
      <dc:creator>shangshin</dc:creator>
      <dc:date>2014-11-20T19:56:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to configure props.conf to filter out events with CTRL-M (^M)?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123517#M25496</link>
      <description>&lt;P&gt;Log file is rotated so that's why the checksum is not the same. &lt;BR /&gt;
Any reason using the attribute crcSalt is not working ???&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///local/0/lns/home/prod/log/jobs.log]
sourcetype = app_log
index = rsch_app
crcSalt=&amp;lt;SOURCE&amp;gt;
followTail = 1
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 20 Nov 2014 20:19:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-configure-props-conf-to-filter-out-events-with-CTRL-M-M/m-p/123517#M25496</guid>
      <dc:creator>shangshin</dc:creator>
      <dc:date>2014-11-20T20:19:32Z</dc:date>
    </item>
  </channel>
</rss>

