Getting Data In

How does splunk handle *nix logrotate based log rotation?

abonuccelli_spl
Splunk Employee
Splunk Employee

Hi,

what will happen if I use splunk to index files apache or syslog which gets rotated to *.gz?

will the data be reprocessed?

What is the default behaviour on 5?

I've found a couple of old answers

http://answers.splunk.com/answers/10309/log-file-rotation
http://answers.splunk.com/answers/12729/will-splunk-re-index-a-log-file-if-i-compress-it-after-its-b...

but I'm not entirely sure about actual behaviour on Splunk 5:

Tags (2)
1 Solution

abonuccelli_spl
Splunk Employee
Splunk Employee

Splunk will not re-index already processed files after they get gzipped.

example using a default monitor stanza like this:

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ pwd
/opt/SPLUNK/5.0.5/splunk
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ./bin/splunk btool inputs list monitor:///var/log/apache2
[monitor:///var/log/apache2]
_rcvbuf = 1572864
disabled = false
followTail = 0
host = linux-test-host
index = default
sourcetype = access_combined

for a folder like this:


user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ls -alrth /var/log/apache2
total 9.9M
drwxr-xr-x 18 root root 4.0K Feb 4 16:42 ..
-rw-r--r-- 1 root root 355 Feb 4 16:42 error.log.5.gz
-rw-r--r-- 1 root root 33K Feb 4 16:58 other_vhosts_access.log.5.gz
-rw-rw-rw- 1 root adm 353 Feb 4 16:58 error.log.4.gz
-rw-rw-rw- 1 root adm 1.7K Feb 4 16:59 other_vhosts_access.log.4.gz
-rw-rw-rw- 1 root adm 355 Feb 4 16:59 error.log.3.gz
-rw-rw-rw- 1 root adm 2.3K Feb 4 17:00 other_vhosts_access.log.3.gz
-rw-rw-rw- 1 root adm 353 Feb 4 17:00 error.log.2.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:02 other_vhosts_access.log.2.gz
-rw-rw-rw- 1 root adm 354 Feb 4 17:02 error.log.1.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:04 other_vhosts_access.log.1.gz
-rw-rw-rw- 1 root adm 280 Feb 4 17:04 error.log
drwxr-x--- 2 root adm 4.0K Feb 4 17:04 .
-rw-rw-rw- 1 root adm 9.8M Feb 4 17:09 other_vhosts_access.log

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ cp var/log/apache2/other_vhosts_access.log.1.gz /tmp/
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ gunzip -d /tmp/other_vhosts_access.log.1.gz
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ wc -l /tmp/other_vhosts_access.log.1
28644 /tmp/other_vhosts_access.log.1

user@linux-test-host ./bin/splunk search "source=/var/log/apache2* | stats count by source"

source count

/var/log/apache2/error.log 2
/var/log/apache2/error.log.1.gz 4
/var/log/apache2/error.log.2.gz 4
/var/log/apache2/error.log.3.gz 4
/var/log/apache2/error.log.4.gz 4
/var/log/apache2/error.log.5.gz 4
/var/log/apache2/other_vhosts_access.log 90875
/var/log/apache2/other_vhosts_access.log.1.gz 28644
/var/log/apache2/other_vhosts_access.log.2.gz 28517
/var/log/apache2/other_vhosts_access.log.3.gz 5341
/var/log/apache2/other_vhosts_access.log.4.gz 3732
/var/log/apache2/other_vhosts_access.log.5.gz 84227

The above tests were done starting a condition where there 0 files in the folder with several logrotation cycle run manually -> logrotate --force /etc/logrotate.d/apache2

When rotation happens,Splunk will find a compressed file which was already processed as non-compressed ( or compressed if starting from folder empty) and will behave like below, from splunkd.log

02-04-2014 17:28:39.513 +0000 INFO ArchiveProcessor - Archive with path="/var/log/apache2/other_vhosts_access.log.1.gz" was already indexed as a non-archive, skipping.

View solution in original post

abonuccelli_spl
Splunk Employee
Splunk Employee

Splunk will not re-index already processed files after they get gzipped.

example using a default monitor stanza like this:

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ pwd
/opt/SPLUNK/5.0.5/splunk
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ./bin/splunk btool inputs list monitor:///var/log/apache2
[monitor:///var/log/apache2]
_rcvbuf = 1572864
disabled = false
followTail = 0
host = linux-test-host
index = default
sourcetype = access_combined

for a folder like this:


user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ls -alrth /var/log/apache2
total 9.9M
drwxr-xr-x 18 root root 4.0K Feb 4 16:42 ..
-rw-r--r-- 1 root root 355 Feb 4 16:42 error.log.5.gz
-rw-r--r-- 1 root root 33K Feb 4 16:58 other_vhosts_access.log.5.gz
-rw-rw-rw- 1 root adm 353 Feb 4 16:58 error.log.4.gz
-rw-rw-rw- 1 root adm 1.7K Feb 4 16:59 other_vhosts_access.log.4.gz
-rw-rw-rw- 1 root adm 355 Feb 4 16:59 error.log.3.gz
-rw-rw-rw- 1 root adm 2.3K Feb 4 17:00 other_vhosts_access.log.3.gz
-rw-rw-rw- 1 root adm 353 Feb 4 17:00 error.log.2.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:02 other_vhosts_access.log.2.gz
-rw-rw-rw- 1 root adm 354 Feb 4 17:02 error.log.1.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:04 other_vhosts_access.log.1.gz
-rw-rw-rw- 1 root adm 280 Feb 4 17:04 error.log
drwxr-x--- 2 root adm 4.0K Feb 4 17:04 .
-rw-rw-rw- 1 root adm 9.8M Feb 4 17:09 other_vhosts_access.log

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ cp var/log/apache2/other_vhosts_access.log.1.gz /tmp/
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ gunzip -d /tmp/other_vhosts_access.log.1.gz
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ wc -l /tmp/other_vhosts_access.log.1
28644 /tmp/other_vhosts_access.log.1

user@linux-test-host ./bin/splunk search "source=/var/log/apache2* | stats count by source"

source count

/var/log/apache2/error.log 2
/var/log/apache2/error.log.1.gz 4
/var/log/apache2/error.log.2.gz 4
/var/log/apache2/error.log.3.gz 4
/var/log/apache2/error.log.4.gz 4
/var/log/apache2/error.log.5.gz 4
/var/log/apache2/other_vhosts_access.log 90875
/var/log/apache2/other_vhosts_access.log.1.gz 28644
/var/log/apache2/other_vhosts_access.log.2.gz 28517
/var/log/apache2/other_vhosts_access.log.3.gz 5341
/var/log/apache2/other_vhosts_access.log.4.gz 3732
/var/log/apache2/other_vhosts_access.log.5.gz 84227

The above tests were done starting a condition where there 0 files in the folder with several logrotation cycle run manually -> logrotate --force /etc/logrotate.d/apache2

When rotation happens,Splunk will find a compressed file which was already processed as non-compressed ( or compressed if starting from folder empty) and will behave like below, from splunkd.log

02-04-2014 17:28:39.513 +0000 INFO ArchiveProcessor - Archive with path="/var/log/apache2/other_vhosts_access.log.1.gz" was already indexed as a non-archive, skipping.

Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...