Getting Data In

How does splunk handle *nix logrotate based log rotation?

Splunk Employee
Splunk Employee

Hi,

what will happen if I use splunk to index files apache or syslog which gets rotated to *.gz?

will the data be reprocessed?

What is the default behaviour on 5?

I've found a couple of old answers

http://answers.splunk.com/answers/10309/log-file-rotation
http://answers.splunk.com/answers/12729/will-splunk-re-index-a-log-file-if-i-compress-it-after-its-b...

but I'm not entirely sure about actual behaviour on Splunk 5:

Tags (2)
1 Solution

Splunk Employee
Splunk Employee

Splunk will not re-index already processed files after they get gzipped.

example using a default monitor stanza like this:

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ pwd
/opt/SPLUNK/5.0.5/splunk
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ./bin/splunk btool inputs list monitor:///var/log/apache2
[monitor:///var/log/apache2]
_rcvbuf = 1572864
disabled = false
followTail = 0
host = linux-test-host
index = default
sourcetype = access_combined

for a folder like this:


user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ls -alrth /var/log/apache2
total 9.9M
drwxr-xr-x 18 root root 4.0K Feb 4 16:42 ..
-rw-r--r-- 1 root root 355 Feb 4 16:42 error.log.5.gz
-rw-r--r-- 1 root root 33K Feb 4 16:58 other_vhosts_access.log.5.gz
-rw-rw-rw- 1 root adm 353 Feb 4 16:58 error.log.4.gz
-rw-rw-rw- 1 root adm 1.7K Feb 4 16:59 other_vhosts_access.log.4.gz
-rw-rw-rw- 1 root adm 355 Feb 4 16:59 error.log.3.gz
-rw-rw-rw- 1 root adm 2.3K Feb 4 17:00 other_vhosts_access.log.3.gz
-rw-rw-rw- 1 root adm 353 Feb 4 17:00 error.log.2.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:02 other_vhosts_access.log.2.gz
-rw-rw-rw- 1 root adm 354 Feb 4 17:02 error.log.1.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:04 other_vhosts_access.log.1.gz
-rw-rw-rw- 1 root adm 280 Feb 4 17:04 error.log
drwxr-x--- 2 root adm 4.0K Feb 4 17:04 .
-rw-rw-rw- 1 root adm 9.8M Feb 4 17:09 other_vhosts_access.log

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ cp var/log/apache2/other_vhosts_access.log.1.gz /tmp/
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ gunzip -d /tmp/other_vhosts_access.log.1.gz
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ wc -l /tmp/other_vhosts_access.log.1
28644 /tmp/other_vhosts_access.log.1

user@linux-test-host ./bin/splunk search "source=/var/log/apache2* | stats count by source"

source count

/var/log/apache2/error.log 2
/var/log/apache2/error.log.1.gz 4
/var/log/apache2/error.log.2.gz 4
/var/log/apache2/error.log.3.gz 4
/var/log/apache2/error.log.4.gz 4
/var/log/apache2/error.log.5.gz 4
/var/log/apache2/other_vhosts_access.log 90875
/var/log/apache2/other_vhosts_access.log.1.gz 28644
/var/log/apache2/other_vhosts_access.log.2.gz 28517
/var/log/apache2/other_vhosts_access.log.3.gz 5341
/var/log/apache2/other_vhosts_access.log.4.gz 3732
/var/log/apache2/other_vhosts_access.log.5.gz 84227

The above tests were done starting a condition where there 0 files in the folder with several logrotation cycle run manually -> logrotate --force /etc/logrotate.d/apache2

When rotation happens,Splunk will find a compressed file which was already processed as non-compressed ( or compressed if starting from folder empty) and will behave like below, from splunkd.log

02-04-2014 17:28:39.513 +0000 INFO ArchiveProcessor - Archive with path="/var/log/apache2/other_vhosts_access.log.1.gz" was already indexed as a non-archive, skipping.

View solution in original post

Splunk Employee
Splunk Employee

Splunk will not re-index already processed files after they get gzipped.

example using a default monitor stanza like this:

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ pwd
/opt/SPLUNK/5.0.5/splunk
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ./bin/splunk btool inputs list monitor:///var/log/apache2
[monitor:///var/log/apache2]
_rcvbuf = 1572864
disabled = false
followTail = 0
host = linux-test-host
index = default
sourcetype = access_combined

for a folder like this:


user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ls -alrth /var/log/apache2
total 9.9M
drwxr-xr-x 18 root root 4.0K Feb 4 16:42 ..
-rw-r--r-- 1 root root 355 Feb 4 16:42 error.log.5.gz
-rw-r--r-- 1 root root 33K Feb 4 16:58 other_vhosts_access.log.5.gz
-rw-rw-rw- 1 root adm 353 Feb 4 16:58 error.log.4.gz
-rw-rw-rw- 1 root adm 1.7K Feb 4 16:59 other_vhosts_access.log.4.gz
-rw-rw-rw- 1 root adm 355 Feb 4 16:59 error.log.3.gz
-rw-rw-rw- 1 root adm 2.3K Feb 4 17:00 other_vhosts_access.log.3.gz
-rw-rw-rw- 1 root adm 353 Feb 4 17:00 error.log.2.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:02 other_vhosts_access.log.2.gz
-rw-rw-rw- 1 root adm 354 Feb 4 17:02 error.log.1.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:04 other_vhosts_access.log.1.gz
-rw-rw-rw- 1 root adm 280 Feb 4 17:04 error.log
drwxr-x--- 2 root adm 4.0K Feb 4 17:04 .
-rw-rw-rw- 1 root adm 9.8M Feb 4 17:09 other_vhosts_access.log

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ cp var/log/apache2/other_vhosts_access.log.1.gz /tmp/
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ gunzip -d /tmp/other_vhosts_access.log.1.gz
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ wc -l /tmp/other_vhosts_access.log.1
28644 /tmp/other_vhosts_access.log.1

user@linux-test-host ./bin/splunk search "source=/var/log/apache2* | stats count by source"

source count

/var/log/apache2/error.log 2
/var/log/apache2/error.log.1.gz 4
/var/log/apache2/error.log.2.gz 4
/var/log/apache2/error.log.3.gz 4
/var/log/apache2/error.log.4.gz 4
/var/log/apache2/error.log.5.gz 4
/var/log/apache2/other_vhosts_access.log 90875
/var/log/apache2/other_vhosts_access.log.1.gz 28644
/var/log/apache2/other_vhosts_access.log.2.gz 28517
/var/log/apache2/other_vhosts_access.log.3.gz 5341
/var/log/apache2/other_vhosts_access.log.4.gz 3732
/var/log/apache2/other_vhosts_access.log.5.gz 84227

The above tests were done starting a condition where there 0 files in the folder with several logrotation cycle run manually -> logrotate --force /etc/logrotate.d/apache2

When rotation happens,Splunk will find a compressed file which was already processed as non-compressed ( or compressed if starting from folder empty) and will behave like below, from splunkd.log

02-04-2014 17:28:39.513 +0000 INFO ArchiveProcessor - Archive with path="/var/log/apache2/other_vhosts_access.log.1.gz" was already indexed as a non-archive, skipping.

View solution in original post

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!