<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Avoid duplicate data and ignore # fields in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78924#M16170</link>
    <description>&lt;P&gt;We use the following line in the sourcetype stanza for iis in the props.conf file.&lt;BR /&gt;&lt;BR /&gt;
SEDCMD-THROWAWAY-COMMENTS=s/^#.+[\r\n]+#.+[\r\n]+#.+[\r\n]+#.*[\r\n]//g&lt;/P&gt;</description>
    <pubDate>Wed, 08 Jun 2016 13:49:49 GMT</pubDate>
    <dc:creator>wsnyder2</dc:creator>
    <dc:date>2016-06-08T13:49:49Z</dc:date>
    <item>
      <title>Avoid duplicate data and ignore # fields</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78921#M16167</link>
      <description>&lt;P&gt;I have customer systems that log data to IIS on file transfers. IIS has a timeout of 20 minutes. When it times out it immediately restarts but throws in a new set of headers. Also the date/time stamp on the log changes and Splunk assumes that it is a new file.&lt;/P&gt;

&lt;P&gt;How can I avoid the duplication of data when Splunk attempts to re-index the log or how do I get Splunk to only consume the new data? And how do I ignore the headers scattered throughout the log file?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2013 20:00:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78921#M16167</guid>
      <dc:creator>kmattern</dc:creator>
      <dc:date>2013-10-01T20:00:08Z</dc:date>
    </item>
    <item>
      <title>Re: Avoid duplicate data and ignore # fields</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78922#M16168</link>
      <description>&lt;P&gt;There are two problems here.  First, you can remove the extra header lines with additions to inputs.conf, props.conf, and transforms.conf.  &lt;/P&gt;

&lt;P&gt;Note: I’m using a new sourcetype, so I need a stanza in inputs.conf.  If you want to use the existing sourcetype in inputs.conf, then you will need to specify that sourcetype in props.conf (i.e. substitute my winIIS with the sourcetype found in your inputs.conf).&lt;/P&gt;

&lt;P&gt;inputs.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor://c:\inetpub\logs\Logfiles\W3SVC1\*.log]
sourcetype = winIIS
queue = parsingQueue
index = default
disabled = false
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[winIIS]
SHOULD_LINEMERGE = false
CHECK_FOR_HEADER = false
REPORT-fields = windows_iis_header
TRANSFORMS-headers = remove_headers
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[remove_headers]
REGEX = ^#.*
DEST_KEY = queue
FORMAT = nullQueue

[winIIS]
FIELDS = “date”,”time”,”s_ip”,….. you need to complete the list with your log header configuration.
DELIMS = “ ”
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Here is another example of the same:&lt;BR /&gt;
&lt;A href="http://answers.splunk.com/answers/24986/iis-log-fields-not-parsing"&gt;http://answers.splunk.com/answers/24986/iis-log-fields-not-parsing&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;As for the duplication problem, I’ve not seen that.  Having the timestamp of the file update is normal, and should not cause a re-read of the file.  Splunk hashes the beginning of the file, so if that does not change then it should not be re-read.  I’m guessing you have a setting in inputs.conf that is causing it.  Can you post your inputs.conf?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2013 21:50:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78922#M16168</guid>
      <dc:creator>lukejadamec</dc:creator>
      <dc:date>2013-10-01T21:50:18Z</dc:date>
    </item>
    <item>
      <title>Re: Avoid duplicate data and ignore # fields</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78923#M16169</link>
      <description>&lt;P&gt;Use &lt;CODE&gt;INDEXED_EXTRACTIONS=W3C&lt;/CODE&gt; in Splunk 6.  We will honor the header found at the top of the file and ignore any line beginning with a # after that.  Plus, we do the field extraction automatically from the header so you don't have to mess with props and transforms.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Feb 2014 16:31:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78923#M16169</guid>
      <dc:creator>ogdin</dc:creator>
      <dc:date>2014-02-13T16:31:31Z</dc:date>
    </item>
    <item>
      <title>Re: Avoid duplicate data and ignore # fields</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78924#M16170</link>
      <description>&lt;P&gt;We use the following line in the sourcetype stanza for iis in the props.conf file.&lt;BR /&gt;&lt;BR /&gt;
SEDCMD-THROWAWAY-COMMENTS=s/^#.+[\r\n]+#.+[\r\n]+#.+[\r\n]+#.*[\r\n]//g&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2016 13:49:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Avoid-duplicate-data-and-ignore-fields/m-p/78924#M16170</guid>
      <dc:creator>wsnyder2</dc:creator>
      <dc:date>2016-06-08T13:49:49Z</dc:date>
    </item>
  </channel>
</rss>

