Getting Data In

How do I setup Splunk to index log4j with org.apache.log4j.RollingFileAppender

Path Finder

We plan to use Splunk to keep log for several java application including web server like Tomcat. Those application are using log4j with org.apache.log4j.RollingFileAppender. The partial config will be like below:


That is when the server.log reaches 10MB, it will be renamed/rollover to server.log.1 and server.log.1 will be server.log.2 and so forth...

My questions:

  1. Is that correct to setup Splunk to monitor (set the file as an input) server.log only. No need to monitor server.log.n. Is that correct?
  2. How quick can Splunk react to a change of server.log For example, if my app server write lots of log (e.g., 10MB log is written in less than 2 second), can Splunk still read and capture my log just before my server.log rollover to server.log.1? If not, how should I setup Splunk?

Splunk Employee
Splunk Employee

You should monitor the whole set of server.log* files. There are some circumstances where you will lose events if you just monitor server.log. First is if Splunk happens to be down while a file rolls. There is also some possibility of losing a few events from the end of the file depending on exactly how the roll is done (via a rename and then creation of a new file, or a copy and truncate, or what), particularly if the file is not updated often enough for Splunk to keep an open file descriptor on it at the time it rolls, e.g., maybe it's only update every 60 seconds.

The cost is that the monitoring processor has to track more files, but this is not a very big problem in version 4.1+, and with only 20 files, this is fine in older versions too (though you might want to raise the max_fd in those versions).

Super Champion

Let me throw in one more scenario, say server.log rotates to server.log.n+1 at the same time as when splunkd is down due to a configuration change.

In general, Splunk does a really good job a keeping up with log files but there are a few scenarios that just aren't covered. And since splunk automatically recognizes rotated log files (and therefore will only indexed the previously unread portions of the log file), to me it can make sense to monitor all of the log files at once. Especially, in scenarios where you can't afford to drop and events.

Also, I don't like seeing multiple sources in this kind of scenario (e.g. server.log.1, server.log.2, ... server.log.N) when they are all originally from a single log file, so I use a transformer to renaming them back to the original server.log name.

Here is an example set of configs:

Example inputs.conf:

sourcetype = java_app_server

Example props.conf:

TRANSFORMS-rename_source = drop_trailing_digit

Example transforms.conf:

DEST_KEY   = MetaData:Source
SOURCE_KEY = MetaData:Source
REGEX    = source::(.*)[-._]\d+$
FORMAT   = source::$1

I've found this kind of setup to work well. Anyone know of a reason why not to monitor all the rotated log files too? I know there are some advantages to monitoring a single file vs a directory like this, but I've never observed any performance impact from it.

For anyone interested, there are some additional source renaming examples here:

Splunk Employee
Splunk Employee

Hi Alan,

I may be able to answer question 1 but question 2 is definitely worth investigating.

IMHO, you only need to monitor server.log

This is because when log4j detects that server.log reaches 10MB it will rollover and write its content to a new file, namely server.log.n, and then clears itself to append new output stream (which the other log file would become server.log.n+1).


Splunk Employee
Splunk Employee

You should monitor all the logs, server.log*. There are only 20 files, so it's not much more work. Please see my answer for why.

0 Karma