Hi,
We are using Splunk forwarder (splunkforwarder-6.6.2-4b804538c686-linux-2.6-x86_64-manifest) to gather logs from Java applications configured to use logback.xml with rolling policy. Sometimes, when the log file is rolled and compressed, Splunk process will go in a loop and take a lot of CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9064 splunk 20 0 253m 157m 12m S 81.2 4.1 129:18.10 splunkd
9072 splunk 20 0 70316 11m 1456 S 3.7 0.3 5:32.13 splunkd
/opt/splunkforwarder/var/log/splunk/splunkd.log is full with messages like this:
03-19-2018 16:00:13.237 +0100 INFO ArchiveProcessor - Handling file=/export/logs/app/app-2018-03-19.5.log.gz
03-19-2018 16:00:13.237 +0100 INFO ArchiveProcessor - reading path=/export/logs/app/app-2018-03-19.5.log.gz (seek=0 len=6818141)
03-19-2018 16:00:13.283 +0100 WARN ArchiveProcessor - Could not access archive=/export/logs/app/app-2018-03-19.5.log.gz, because:retrieve: key=0x963d68b47a9905d1 is already locked with state=0x7f2436c97200
03-19-2018 16:00:13.283 +0100 INFO ArchiveProcessor - Handling file=/export/logs/app/app-2018-03-19.5.log.gz
03-19-2018 16:00:13.284 +0100 INFO ArchiveProcessor - reading path=/export/logs/app/app-2018-03-19.5.log.gz (seek=0 len=6818141)
03-19-2018 16:00:13.327 +0100 WARN ArchiveProcessor - Could not access archive=/export/logs/app/app-2018-03-19.5.log.gz, because:retrieve: key=0x963d68b47a9905d1 is already locked with state=0x7f2436c97200
I would also spawn a lot of child processes and update "/var/run/splunk/splunkd.pid".
The creation time of that app-2018-03-19.5.log.gz
file correlates with the start of CPU spike. Source code and documentation of LOGBack [https://logback.qos.ch/documentation.html] tell us that they don't use temporary files when rolling the logs and performing compression.
Don't use followTail
. The Admin manual says "Do not use followTail for rolling log files (log files that get renamed as they age."
Try one of these settings. Notice I changed ignoreOlderThan
to allow for outages.
[monitor:///export/logs/app/*.log]
ignoreOlderThan = 3d
or
[monitor:///export/logs/app/]
ignoreOlderThan = 3d
whitelist=^/export/logs/app/app.*.log$
followTail = 0
did not solve the issue as the archive is still getting locked with state
. Restarting splunk service resolves the issue temporary as the lock is being lifted and the same archive get processed with was already indexed as a non-archive, skipping
.
The next step should be to exclude *.gz
files from processing which might mean that is some scenarios are loosing log entries.
Please check the inputs.conf setting in splunk.
Please refer the below document it will help you
http://docs.splunk.com/Documentation/Splunk/6.1/Admin/Inputsconf
hope you will like this answer
Hi, thank you for your answer. Could you be more specific on which section might be causing the issue? The only one that I could think of is followTail = 1
. This issue doesn't happen consistently, only is some cases when the logs are being rolled over and compressed.
What are the forwarder's inputs.conf settings?
[monitor:///export/logs/app/]
followTail = 1
ignoreOlderThan = 1d
whitelist=^/export/logs/app/app.*$