<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Rolled logs compressed immediately in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104209#M21936</link>
    <description>&lt;P&gt;The scenario is this:&lt;BR /&gt;
1. Splunk is tailing a file.&lt;BR /&gt;
2. Splunk closes the file for whatever reason, whether it hasn't been modified for some time, or Splunk is down, or there are so many files that Splunk is having trouble keeping all the active files open.&lt;BR /&gt;
3. Entries are written to the file while Splunk has the file closed.&lt;BR /&gt;
4. Before Splunk reopens the file, the file is rolled and compressed immediately.&lt;/P&gt;

&lt;P&gt;The desire is for Splunk to handle this case, which unfortunately isn't that uncommon.&lt;/P&gt;</description>
    <pubDate>Fri, 20 May 2011 00:35:04 GMT</pubDate>
    <dc:creator>vbumgarner</dc:creator>
    <dc:date>2011-05-20T00:35:04Z</dc:date>
    <item>
      <title>Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104199#M21926</link>
      <description>&lt;P&gt;In most cases, each log is rolled to a file in the same directory, or even a nearby directory, either with the same name, or changed to include the date or an index. For instance some.log.2011-05-05 or some.log.1.  In the vast majority of cases, Splunk handles this without issue because it uses checksums of the contents of the logs instead of the log names.&lt;/P&gt;

&lt;P&gt;I have seen on a few occasions software that compresses the rolled log immediately. This is a problem for two reasons:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;If Splunk does not have the file open at the time it is compressed, then anything written to the log after Splunk last closed the file will not be indexed.&lt;/LI&gt;
&lt;LI&gt;If Splunk is configured to index the compressed logs, the data will be indexed twice. Why? Though Splunk will index compressed logs without issue, it has no way to know that it has already seen this log in uncompressed form.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;There are three scenarios I can imagine where Splunk will not have the log open:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Splunk has closed the file because it hasn't been written to recently. This is controlled by time_before_close, and by default is 3 seconds.&lt;/LI&gt;
&lt;LI&gt;Splunk has run out of file descriptors, and is waiting for one of the logs it currently has open to be closed.&lt;/LI&gt;
&lt;LI&gt;Splunk is not running.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;A few options I see to deal with the issue:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Change the software or configuration to wait a day before compressing a rolled log. This is the usual approach and is the best solution.&lt;/LI&gt;
&lt;LI&gt;Increase the time_before_close to a fairly large number. This is only a remotely good idea if the number of active logs on the system is quite small. This also does not help if Splunk happens to be down during the roll.&lt;/LI&gt;
&lt;LI&gt;Wait for the log to be rolled and only index the compressed log. This is not ideal, as the index will be out of date most of the time, and the indexers will do most of their work all at once.&lt;/LI&gt;
&lt;LI&gt;Write a scripted input that handles the tailing and handles the compressed files as needed. This is an absolute last resort, because it introduces a complicated piece of code that must be maintained by the client.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;How have others solved this problem?&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:34:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104199#M21926</guid>
      <dc:creator>vbumgarner</dc:creator>
      <dc:date>2020-09-28T09:34:39Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104200#M21927</link>
      <description>&lt;P&gt;Best practice is:&lt;/P&gt;

&lt;P&gt;1 - Use the latest version of Splunk (4.2.x).   Splunk now decompresses the file and then performs the checksum.  Thanks to transamrit for this!&lt;/P&gt;

&lt;P&gt;2 - As mentioned, delay the compression for a day as most software allows this.   You will get a day to fix whatever your problem was (system or Splunk).  You will need to blacklist the .gz or only index the .log file.&lt;/P&gt;</description>
      <pubDate>Wed, 18 May 2011 18:58:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104200#M21927</guid>
      <dc:creator>Simeon</dc:creator>
      <dc:date>2011-05-18T18:58:22Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104201#M21928</link>
      <description>&lt;P&gt;Excellent! Is this supported in the universal forwarder, as well?&lt;/P&gt;</description>
      <pubDate>Wed, 18 May 2011 19:07:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104201#M21928</guid>
      <dc:creator>vbumgarner</dc:creator>
      <dc:date>2011-05-18T19:07:05Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104202#M21929</link>
      <description>&lt;P&gt;It uses the same tailing processor, so yes.   T&lt;/P&gt;</description>
      <pubDate>Wed, 18 May 2011 20:20:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104202#M21929</guid>
      <dc:creator>Simeon</dc:creator>
      <dc:date>2011-05-18T20:20:10Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104203#M21930</link>
      <description>&lt;P&gt;It doesn't seem to work for this particular use case. Does it only check the top of the file?&lt;/P&gt;

&lt;P&gt;05-18-2011 15:27:34.271 -0500 INFO  ArchiveProcessor - Archive with path="/tmp/foo/XCA_XCPD-2011-04-11T00_08_081e.gz" was already indexed as a non-archive, skipping.&lt;/P&gt;

&lt;P&gt;My test was to move an indexed log aside, add something to the end, gzip it, and put it back.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:34:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104203#M21930</guid>
      <dc:creator>vbumgarner</dc:creator>
      <dc:date>2020-09-28T09:34:52Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104204#M21931</link>
      <description>&lt;P&gt;CRC is against the beginning and end:  &lt;A href="http://www.splunk.com/base/Documentation/4.2.1/Data/Howlogfilerotationishandled"&gt;http://www.splunk.com/base/Documentation/4.2.1/Data/Howlogfilerotationishandled&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Are you checking the same file archived and NOT archived?  4.2.x should have a sense of state for a file that has been archived and matches the crc of another file.&lt;/P&gt;</description>
      <pubDate>Wed, 18 May 2011 20:35:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104204#M21931</guid>
      <dc:creator>Simeon</dc:creator>
      <dc:date>2011-05-18T20:35:08Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104205#M21932</link>
      <description>&lt;P&gt;The test was...&lt;BR /&gt;
1. Index a file.&lt;BR /&gt;
2. Move the file aside.&lt;BR /&gt;
3. Add some entries to the end of that file.&lt;BR /&gt;
4. gzip that file.&lt;BR /&gt;
5. Put it back.&lt;/P&gt;</description>
      <pubDate>Wed, 18 May 2011 20:45:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104205#M21932</guid>
      <dc:creator>vbumgarner</dc:creator>
      <dc:date>2011-05-18T20:45:08Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104206#M21933</link>
      <description>&lt;P&gt;So is it re-indexing the file, or is it adding that data?&lt;/P&gt;

&lt;P&gt;I would enable debug for the TailingProcessor to see what Splunk thinks.   Can also use btprobe if you are familiar with it.&lt;/P&gt;</description>
      <pubDate>Thu, 19 May 2011 22:20:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104206#M21933</guid>
      <dc:creator>Simeon</dc:creator>
      <dc:date>2011-05-19T22:20:30Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104207#M21934</link>
      <description>&lt;P&gt;It thinks it has already indexed the entire file. &lt;/P&gt;

&lt;P&gt;INFO ArchiveProcessor - Archive with path="/tmp/foo/XCA_XCPD-2011-04-11T00_08_081e.gz" was already indexed as a non-archive, skipping.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:35:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104207#M21934</guid>
      <dc:creator>vbumgarner</dc:creator>
      <dc:date>2020-09-28T09:35:26Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104208#M21935</link>
      <description>&lt;P&gt;I believe the check is supposed to force skipping of it if the CRC matches.  So in theory, this is a good thing in ways.  Why is there data being added after compression/roll?&lt;/P&gt;</description>
      <pubDate>Thu, 19 May 2011 22:42:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104208#M21935</guid>
      <dc:creator>Simeon</dc:creator>
      <dc:date>2011-05-19T22:42:14Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104209#M21936</link>
      <description>&lt;P&gt;The scenario is this:&lt;BR /&gt;
1. Splunk is tailing a file.&lt;BR /&gt;
2. Splunk closes the file for whatever reason, whether it hasn't been modified for some time, or Splunk is down, or there are so many files that Splunk is having trouble keeping all the active files open.&lt;BR /&gt;
3. Entries are written to the file while Splunk has the file closed.&lt;BR /&gt;
4. Before Splunk reopens the file, the file is rolled and compressed immediately.&lt;/P&gt;

&lt;P&gt;The desire is for Splunk to handle this case, which unfortunately isn't that uncommon.&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2011 00:35:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104209#M21936</guid>
      <dc:creator>vbumgarner</dc:creator>
      <dc:date>2011-05-20T00:35:04Z</dc:date>
    </item>
    <item>
      <title>Re: Rolled logs compressed immediately</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104210#M21937</link>
      <description>&lt;P&gt;Hi, I have the same problem with rotated logfiles.&lt;BR /&gt;
I'm using Universal Forwarder in version 6.4.5 to monitor a log file and it's rotated versions. There was a network outage and the UF was not able to send it's data for some time. In the meanwhile the logs were rotated and zipped. The files it never began to read were read fine after the Network problem was resolved - even the zipped ones. But the file it was reading at the beginning of the outage was only unzipped and then commented with "already read, so skipped".&lt;/P&gt;

&lt;P&gt;When I manually unpacked the file and put it in place the UF started reading where it stopped because of the outage. So I think UF is skipping the check of the seekCRC at seekAdress as mentioned here:&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/6.4.5/Data/HowLogFileRotationIsHandled"&gt;https://docs.splunk.com/Documentation/Splunk/6.4.5/Data/HowLogFileRotationIsHandled&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Does anyone know, if this is resolved in any Version?&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 15:04:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rolled-logs-compressed-immediately/m-p/104210#M21937</guid>
      <dc:creator>goelli</dc:creator>
      <dc:date>2019-12-10T15:04:53Z</dc:date>
    </item>
  </channel>
</rss>

