Hi,
We are using Splunk forwarder (splunkforwarder-6.6.2-4b804538c686-linux-2.6-x86_64-manifest) to gather logs from Java applications configured to use logback.xml with rolling policy. Sometimes, when the log file is rolled and compressed, Splunk process will go in a loop and take a lot of CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9064 splunk 20 0 253m 157m 12m S 81.2 4.1 129:18.10 splunkd
9072 splunk 20 0 70316 11m 1456 S 3.7 0.3 5:32.13 splunkd
/opt/splunkforwarder/var/log/splunk/splunkd.log is full with messages like this:
03-19-2018 16:00:13.237 +0100 INFO ArchiveProcessor - Handling file=/export/logs/app/app-2018-03-19.5.log.gz
03-19-2018 16:00:13.237 +0100 INFO ArchiveProcessor - reading path=/export/logs/app/app-2018-03-19.5.log.gz (seek=0 len=6818141)
03-19-2018 16:00:13.283 +0100 WARN ArchiveProcessor - Could not access archive=/export/logs/app/app-2018-03-19.5.log.gz, because:retrieve: key=0x963d68b47a9905d1 is already locked with state=0x7f2436c97200
03-19-2018 16:00:13.283 +0100 INFO ArchiveProcessor - Handling file=/export/logs/app/app-2018-03-19.5.log.gz
03-19-2018 16:00:13.284 +0100 INFO ArchiveProcessor - reading path=/export/logs/app/app-2018-03-19.5.log.gz (seek=0 len=6818141)
03-19-2018 16:00:13.327 +0100 WARN ArchiveProcessor - Could not access archive=/export/logs/app/app-2018-03-19.5.log.gz, because:retrieve: key=0x963d68b47a9905d1 is already locked with state=0x7f2436c97200
I would also spawn a lot of child processes and update "/var/run/splunk/splunkd.pid".
The creation time of that app-2018-03-19.5.log.gz file correlates with the start of CPU spike. Source code and documentation of LOGBack [https://logback.qos.ch/documentation.html] tell us that they don't use temporary files when rolling the logs and performing compression.
... View more