Getting Data In

How does splunk handle *nix logrotate based log rotation?

abonuccelli_spl
Splunk Employee
Splunk Employee

Hi,

what will happen if I use splunk to index files apache or syslog which gets rotated to *.gz?

will the data be reprocessed?

What is the default behaviour on 5?

I've found a couple of old answers

http://answers.splunk.com/answers/10309/log-file-rotation
http://answers.splunk.com/answers/12729/will-splunk-re-index-a-log-file-if-i-compress-it-after-its-b...

but I'm not entirely sure about actual behaviour on Splunk 5:

Tags (2)
1 Solution

abonuccelli_spl
Splunk Employee
Splunk Employee

Splunk will not re-index already processed files after they get gzipped.

example using a default monitor stanza like this:

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ pwd
/opt/SPLUNK/5.0.5/splunk
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ./bin/splunk btool inputs list monitor:///var/log/apache2
[monitor:///var/log/apache2]
_rcvbuf = 1572864
disabled = false
followTail = 0
host = linux-test-host
index = default
sourcetype = access_combined

for a folder like this:


user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ls -alrth /var/log/apache2
total 9.9M
drwxr-xr-x 18 root root 4.0K Feb 4 16:42 ..
-rw-r--r-- 1 root root 355 Feb 4 16:42 error.log.5.gz
-rw-r--r-- 1 root root 33K Feb 4 16:58 other_vhosts_access.log.5.gz
-rw-rw-rw- 1 root adm 353 Feb 4 16:58 error.log.4.gz
-rw-rw-rw- 1 root adm 1.7K Feb 4 16:59 other_vhosts_access.log.4.gz
-rw-rw-rw- 1 root adm 355 Feb 4 16:59 error.log.3.gz
-rw-rw-rw- 1 root adm 2.3K Feb 4 17:00 other_vhosts_access.log.3.gz
-rw-rw-rw- 1 root adm 353 Feb 4 17:00 error.log.2.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:02 other_vhosts_access.log.2.gz
-rw-rw-rw- 1 root adm 354 Feb 4 17:02 error.log.1.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:04 other_vhosts_access.log.1.gz
-rw-rw-rw- 1 root adm 280 Feb 4 17:04 error.log
drwxr-x--- 2 root adm 4.0K Feb 4 17:04 .
-rw-rw-rw- 1 root adm 9.8M Feb 4 17:09 other_vhosts_access.log

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ cp var/log/apache2/other_vhosts_access.log.1.gz /tmp/
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ gunzip -d /tmp/other_vhosts_access.log.1.gz
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ wc -l /tmp/other_vhosts_access.log.1
28644 /tmp/other_vhosts_access.log.1

user@linux-test-host ./bin/splunk search "source=/var/log/apache2* | stats count by source"

source count

/var/log/apache2/error.log 2
/var/log/apache2/error.log.1.gz 4
/var/log/apache2/error.log.2.gz 4
/var/log/apache2/error.log.3.gz 4
/var/log/apache2/error.log.4.gz 4
/var/log/apache2/error.log.5.gz 4
/var/log/apache2/other_vhosts_access.log 90875
/var/log/apache2/other_vhosts_access.log.1.gz 28644
/var/log/apache2/other_vhosts_access.log.2.gz 28517
/var/log/apache2/other_vhosts_access.log.3.gz 5341
/var/log/apache2/other_vhosts_access.log.4.gz 3732
/var/log/apache2/other_vhosts_access.log.5.gz 84227

The above tests were done starting a condition where there 0 files in the folder with several logrotation cycle run manually -> logrotate --force /etc/logrotate.d/apache2

When rotation happens,Splunk will find a compressed file which was already processed as non-compressed ( or compressed if starting from folder empty) and will behave like below, from splunkd.log

02-04-2014 17:28:39.513 +0000 INFO ArchiveProcessor - Archive with path="/var/log/apache2/other_vhosts_access.log.1.gz" was already indexed as a non-archive, skipping.

View solution in original post

abonuccelli_spl
Splunk Employee
Splunk Employee

Splunk will not re-index already processed files after they get gzipped.

example using a default monitor stanza like this:

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ pwd
/opt/SPLUNK/5.0.5/splunk
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ./bin/splunk btool inputs list monitor:///var/log/apache2
[monitor:///var/log/apache2]
_rcvbuf = 1572864
disabled = false
followTail = 0
host = linux-test-host
index = default
sourcetype = access_combined

for a folder like this:


user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ls -alrth /var/log/apache2
total 9.9M
drwxr-xr-x 18 root root 4.0K Feb 4 16:42 ..
-rw-r--r-- 1 root root 355 Feb 4 16:42 error.log.5.gz
-rw-r--r-- 1 root root 33K Feb 4 16:58 other_vhosts_access.log.5.gz
-rw-rw-rw- 1 root adm 353 Feb 4 16:58 error.log.4.gz
-rw-rw-rw- 1 root adm 1.7K Feb 4 16:59 other_vhosts_access.log.4.gz
-rw-rw-rw- 1 root adm 355 Feb 4 16:59 error.log.3.gz
-rw-rw-rw- 1 root adm 2.3K Feb 4 17:00 other_vhosts_access.log.3.gz
-rw-rw-rw- 1 root adm 353 Feb 4 17:00 error.log.2.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:02 other_vhosts_access.log.2.gz
-rw-rw-rw- 1 root adm 354 Feb 4 17:02 error.log.1.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:04 other_vhosts_access.log.1.gz
-rw-rw-rw- 1 root adm 280 Feb 4 17:04 error.log
drwxr-x--- 2 root adm 4.0K Feb 4 17:04 .
-rw-rw-rw- 1 root adm 9.8M Feb 4 17:09 other_vhosts_access.log

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ cp var/log/apache2/other_vhosts_access.log.1.gz /tmp/
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ gunzip -d /tmp/other_vhosts_access.log.1.gz
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ wc -l /tmp/other_vhosts_access.log.1
28644 /tmp/other_vhosts_access.log.1

user@linux-test-host ./bin/splunk search "source=/var/log/apache2* | stats count by source"

source count

/var/log/apache2/error.log 2
/var/log/apache2/error.log.1.gz 4
/var/log/apache2/error.log.2.gz 4
/var/log/apache2/error.log.3.gz 4
/var/log/apache2/error.log.4.gz 4
/var/log/apache2/error.log.5.gz 4
/var/log/apache2/other_vhosts_access.log 90875
/var/log/apache2/other_vhosts_access.log.1.gz 28644
/var/log/apache2/other_vhosts_access.log.2.gz 28517
/var/log/apache2/other_vhosts_access.log.3.gz 5341
/var/log/apache2/other_vhosts_access.log.4.gz 3732
/var/log/apache2/other_vhosts_access.log.5.gz 84227

The above tests were done starting a condition where there 0 files in the folder with several logrotation cycle run manually -> logrotate --force /etc/logrotate.d/apache2

When rotation happens,Splunk will find a compressed file which was already processed as non-compressed ( or compressed if starting from folder empty) and will behave like below, from splunkd.log

02-04-2014 17:28:39.513 +0000 INFO ArchiveProcessor - Archive with path="/var/log/apache2/other_vhosts_access.log.1.gz" was already indexed as a non-archive, skipping.

Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...