<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why does Splunk (re-)index this rolled file? How to troubleshoot? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300595#M56833</link>
    <description>&lt;P&gt;It's a busy log file. It often rolls before Splunk has finished reading the last X entries. Including the rolled files in the monitor entry is best practice -- if not officially from Splunk, definitely in my experience. Usually it works fine, I'm just at a loss to explain why it's failing in this case.&lt;/P&gt;</description>
    <pubDate>Mon, 03 Apr 2017 15:12:17 GMT</pubDate>
    <dc:creator>twinspop</dc:creator>
    <dc:date>2017-04-03T15:12:17Z</dc:date>
    <item>
      <title>Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300591#M56829</link>
      <description>&lt;P&gt;Inputs stanza from btool:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///apps/Logs/*/www/Reporting/CRTLog.log*]
_rcvbuf = 1572864
disabled = 0
host = apphost1
index = reporting_main
sourcetype = reporting_crtlog
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The log rotation they use keeps 10 rolled copies, named with .1-10 on the end. Eg, when the original rolls it gets named CRTLog.log.1 and a new CRTLog.log file is created. Standard stuff.&lt;/P&gt;

&lt;P&gt;I have confirmed, without a doubt, the rolled files maintain consistent content. I wrote a script to grab checksums of the first 1KB of each file every few seconds. They always check out -- .1's checksum matches what the original showed before rolling.&lt;/P&gt;

&lt;P&gt;However, Splunk is sometimes (not all the time) treating the 1st rolled file as a new file:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; WatchedFile - Will begin reading at offset=0 for file='/apps/Logs/apphost1/www/Reporting/CRTLog.log.1'
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Probably 30% of the time it re-reads the rolled file. Only .1, never any of the others.&lt;/P&gt;

&lt;P&gt;Any tips to further troubleshoot this?&lt;/P&gt;

&lt;P&gt;(Ticket's open, but after 3 days I kinda need an answer.)&lt;/P&gt;

&lt;P&gt;EDIT: Sample checksum comparo:&lt;/P&gt;

&lt;P&gt;I use &lt;CODE&gt;for f in $(ls); do echo -n "$f: "; head -50 $f | md5sum; done&lt;/CODE&gt; to grab a list:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;CRTLog.log: 0fb375c11ad382eec3cc482fb1332c81  -
CRTLog.log.1: 40f3878392f5ca816bfc4948b263d0e2  -
CRTLog.log.10: ffc1a6dec71a64f69a2f4c42b53d68cb  -
CRTLog.log.2: a3b7d786d8aa7260cc5e46635e764c8f  -
&amp;lt;snip&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then wait for a roll to fire and grab the new list:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;CRTLog.log: ad978fdb89b04169e95ba96c15887042  -
CRTLog.log.1: 0fb375c11ad382eec3cc482fb1332c81  -
CRTLog.log.10: 82d1b645c89e4e34b4e0a89712d30f3e  -
CRTLog.log.2: 40f3878392f5ca816bfc4948b263d0e2  -
CRTLog.log.3: a3b7d786d8aa7260cc5e46635e764c8f  -
&amp;lt;snip&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;So the first 50 lines (about 16 KB worth of data), matches before and after roll to .1. Splunk re-read the file in this case.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 14:17:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300591#M56829</guid>
      <dc:creator>twinspop</dc:creator>
      <dc:date>2017-04-03T14:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300592#M56830</link>
      <description>&lt;P&gt;I don't know why but you should just blacklist the &lt;CODE&gt;*log.1&lt;/CODE&gt; file and be done with it.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 14:23:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300592#M56830</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-04-03T14:23:56Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300593#M56831</link>
      <description>&lt;P&gt;I have a sneaking feeling I would just see .2 show up as a dup. So the next step would be to drop the * and just log the original... but then we get missed logs. (Busy log file)&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 14:35:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300593#M56831</guid>
      <dc:creator>twinspop</dc:creator>
      <dc:date>2017-04-03T14:35:18Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300594#M56832</link>
      <description>&lt;P&gt;Why even do an asterisk after .log in the monitor line? As long as they have been indexed when CRTLog.log, no need to even look at them ever again:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///apps/Logs/*/www/Reporting/CRTLog.log]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If this is pushed out to a new host via the deployment server, I can see why you would want the old files indexed, but that is the only case I can see for adding the * on the end of the line.&lt;/P&gt;

&lt;P&gt;One more case for not having the asterisk is that it requires less CPU and memory to look at just one file vs. 11 files.&lt;/P&gt;

&lt;P&gt;Just tryin' to keep it simple. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 15:07:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300594#M56832</guid>
      <dc:creator>cpetterborg</dc:creator>
      <dc:date>2017-04-03T15:07:12Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300595#M56833</link>
      <description>&lt;P&gt;It's a busy log file. It often rolls before Splunk has finished reading the last X entries. Including the rolled files in the monitor entry is best practice -- if not officially from Splunk, definitely in my experience. Usually it works fine, I'm just at a loss to explain why it's failing in this case.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 15:12:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300595#M56833</guid>
      <dc:creator>twinspop</dc:creator>
      <dc:date>2017-04-03T15:12:17Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300596#M56834</link>
      <description>&lt;P&gt;It looks like a pretty standard inputs.conf stanza.... How about the  &lt;CODE&gt;CRTLog.log*&lt;/CODE&gt; in your monitor line ... &lt;CODE&gt;[monitor:///apps/Logs/*/www/Reporting/CRTLog.log*]&lt;/CODE&gt; ... Have you tried without the * at the end and just have &lt;CODE&gt;[monitor:///apps/Logs/*/www/Reporting/CRTLog.log]&lt;/CODE&gt; otherwise I like the blacklist idea from woodcock or maybe  have the log-roll name changed?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 16:26:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300596#M56834</guid>
      <dc:creator>rewritex</dc:creator>
      <dc:date>2017-04-03T16:26:56Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300597#M56835</link>
      <description>&lt;P&gt;The identification of files regardless of name to handle rolled files is a core feature of splunk. And in this case, it's required for us. Without the asterisk we very noticeably miss log entries. Currently our choice is to miss log entries or have double entries. Not optimal! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 16:45:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300597#M56835</guid>
      <dc:creator>twinspop</dc:creator>
      <dc:date>2017-04-03T16:45:40Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300598#M56836</link>
      <description>&lt;P&gt;I didn't realize my question was already asked ... sorry about that.&lt;/P&gt;

&lt;P&gt;A recent issue I had concerning getting the data in ... I had to remove my * and pull in the whole directory. My &lt;CODE&gt;[monitor:///Logs/isam/reports/access.log*]&lt;/CODE&gt; became &lt;CODE&gt;[monitor:///Logs/isam/reports/access.log/]&lt;/CODE&gt; and that worked for me.. It had to monitor the whole directory instead of the wildcard on the log name. I also kept running into a problem with the whitelist parameter so I dropped that.  I worked with &lt;CODE&gt;$SPLUNK_HOME/bin/splunk list monitor&lt;/CODE&gt; to show me which files/directories are being monitored (ran on my UF)... This highlighted a regex issue I had with escaping a character incorrectly in another stanza. Good Luck. &lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 18:13:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300598#M56836</guid>
      <dc:creator>rewritex</dc:creator>
      <dc:date>2017-04-03T18:13:23Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300599#M56837</link>
      <description>&lt;P&gt;I didn't realize my question was already asked ... sorry about that.&lt;/P&gt;

&lt;P&gt;A recent issue I had concerning getting the data in ... I had to remove my * and pull in the whole directory. My &lt;CODE&gt;[monitor:///Logs/isam/reports/access.log*]&lt;/CODE&gt; became &lt;CODE&gt;[monitor:///Logs/isam/reports/access.log/]&lt;/CODE&gt; and that worked for me.. It had to monitor the whole directory instead of the wildcard on the log name. I also kept running into a problem with the whitelist parameter so I dropped that.  I worked with &lt;CODE&gt;$SPLUNK_HOME/bin/splunk list monitor&lt;/CODE&gt; to show me which files/directories are being monitored (ran on my UF)... This highlighted a regex issue I had with escaping a character incorrectly in another stanza. Good Luck. &lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 18:13:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300599#M56837</guid>
      <dc:creator>rewritex</dc:creator>
      <dc:date>2017-04-03T18:13:23Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300600#M56838</link>
      <description>&lt;P&gt;EDIT:  Spoke too soon. Just got lucky with a string of good rolls. The 13th one failed. Same scenario. Sigh.&lt;/P&gt;

&lt;P&gt;This looks like the fix (EDIT: nope). &lt;/P&gt;

&lt;P&gt;Bad:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///apps/Logs/*/www/Reporting/CRTLog.log*]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Good:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///apps/Logs/*/www/Reporting/]
whitelist = CRTLog
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That seems like a bug to me. Not sure what's triggering it because I use the "Bad" style above in literally a thousand different scenarios. This is the first that's bitten me.&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2017 18:31:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300600#M56838</guid>
      <dc:creator>twinspop</dc:creator>
      <dc:date>2017-04-03T18:31:52Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300601#M56839</link>
      <description>&lt;P&gt;Hello Jon... Any luck with an answer or resolution on this issue?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2017 22:11:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300601#M56839</guid>
      <dc:creator>rewritex</dc:creator>
      <dc:date>2017-08-01T22:11:42Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300602#M56840</link>
      <description>&lt;P&gt;No. Splunk Support was not helpful, wasting hours of work. I eventually told the user to only index the current file and realize that some logs will be lost at roll time. It's a horrible solution, but I can't get anyone at Splunk to care.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2017 22:26:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300602#M56840</guid>
      <dc:creator>twinspop</dc:creator>
      <dc:date>2017-08-01T22:26:32Z</dc:date>
    </item>
    <item>
      <title>Re: Why does Splunk (re-)index this rolled file? How to troubleshoot?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300603#M56841</link>
      <description>&lt;P&gt;This issue is resolved by&lt;BR /&gt;
7.1 (SPL-149198)&lt;BR /&gt;
7.0.4 (SPL-153453) &lt;BR /&gt;
6.6.7(SPL-146190)&lt;/P&gt;</description>
      <pubDate>Sat, 12 May 2018 02:03:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-does-Splunk-re-index-this-rolled-file-How-to-troubleshoot/m-p/300603#M56841</guid>
      <dc:creator>hrawat</dc:creator>
      <dc:date>2018-05-12T02:03:15Z</dc:date>
    </item>
  </channel>
</rss>

