Issues handling rolled over gzip archives

markmitrushkin · ‎03-19-2018

Hi,

We are using Splunk forwarder (splunkforwarder-6.6.2-4b804538c686-linux-2.6-x86_64-manifest) to gather logs from Java applications configured to use logback.xml with rolling policy. Sometimes, when the log file is rolled and compressed, Splunk process will go in a loop and take a lot of CPU:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9064 splunk    20   0  253m 157m  12m S 81.2  4.1 129:18.10 splunkd
 9072 splunk    20   0 70316  11m 1456 S  3.7  0.3   5:32.13 splunkd

/opt/splunkforwarder/var/log/splunk/splunkd.log is full with messages like this:

03-19-2018 16:00:13.237 +0100 INFO  ArchiveProcessor - Handling file=/export/logs/app/app-2018-03-19.5.log.gz
03-19-2018 16:00:13.237 +0100 INFO  ArchiveProcessor - reading path=/export/logs/app/app-2018-03-19.5.log.gz (seek=0 len=6818141)
03-19-2018 16:00:13.283 +0100 WARN  ArchiveProcessor -   Could not access archive=/export/logs/app/app-2018-03-19.5.log.gz, because:retrieve: key=0x963d68b47a9905d1 is already locked with state=0x7f2436c97200
03-19-2018 16:00:13.283 +0100 INFO  ArchiveProcessor - Handling file=/export/logs/app/app-2018-03-19.5.log.gz
03-19-2018 16:00:13.284 +0100 INFO  ArchiveProcessor - reading path=/export/logs/app/app-2018-03-19.5.log.gz (seek=0 len=6818141)
03-19-2018 16:00:13.327 +0100 WARN  ArchiveProcessor -   Could not access archive=/export/logs/app/app-2018-03-19.5.log.gz, because:retrieve: key=0x963d68b47a9905d1 is already locked with state=0x7f2436c97200

I would also spawn a lot of child processes and update "/var/run/splunk/splunkd.pid".

The creation time of that app-2018-03-19.5.log.gz file correlates with the start of CPU spike. Source code and documentation of LOGBack [https://logback.qos.ch/documentation.html] tell us that they don't use temporary files when rolling the logs and performing compression.

richgalloway · ‎03-20-2018

Don't use followTail. The Admin manual says "Do not use followTail for rolling log files (log files that get renamed as they age."

Try one of these settings. Notice I changed ignoreOlderThan to allow for outages.

[monitor:///export/logs/app/*.log]
ignoreOlderThan = 3d

or

[monitor:///export/logs/app/]
ignoreOlderThan = 3d
whitelist=^/export/logs/app/app.*.log$

---
If this reply helps you, Karma would be appreciated.

markmitrushkin · ‎03-21-2018

followTail = 0 did not solve the issue as the archive is still getting locked with state. Restarting splunk service resolves the issue temporary as the lock is being lifted and the same archive get processed with was already indexed as a non-archive, skipping.

The next step should be to exclude *.gz files from processing which might mean that is some scenarios are loosing log entries.

logloganathan · ‎03-20-2018

Please check the inputs.conf setting in splunk.

Please refer the below document it will help you

http://docs.splunk.com/Documentation/Splunk/6.1/Admin/Inputsconf

hope you will like this answer

markmitrushkin · ‎03-20-2018

Hi, thank you for your answer. Could you be more specific on which section might be causing the issue? The only one that I could think of is followTail = 1. This issue doesn't happen consistently, only is some cases when the logs are being rolled over and compressed.

richgalloway · ‎03-19-2018

What are the forwarder's inputs.conf settings?

---
If this reply helps you, Karma would be appreciated.

markmitrushkin · ‎03-20-2018

[monitor:///export/logs/app/]
followTail = 1
ignoreOlderThan = 1d
whitelist=^/export/logs/app/app.*$

Issues handling rolled over gzip archives

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Transform your security operations with Splunk Enterprise Security

Splunk Admins and App Developers | Earn a $35 gift card!