Solved: Re: Need to customize log4j sourcetype

Adam · ‎06-15-2010

The logs I'm trying to index are in a log4j style, and entries such as

2010-06-15 09:04:08,204 [[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel
.Default (self-tuning)'][intOrdId=17746,intVrsn=18846] DEBUG com.att.canopi.idis
.ordermanagement.sms.cramer.adapter.util.ReflectionsTools.ipagi1a1.snt.bst.bls.c
om - Set value1:Customer100 field name1:customerName

are properly split into unique entries, however some entries are multi-line, and have embedded XML, and these are split wherever a date (with a different format than the date at the start of an entry) is found, so the log entry

2010-06-15 09:04:07,686 [[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel
.Default (self-tuning)'][intOrdId=17746,intVrsn=18846] INFO  fooHandler - <env:Envelope xmlns:env="
http://schemas.xmlsoap.org/soap/envelope/">
  <env:Header/>
  <env:Body>
    <v1:getFooDetailRequest xmlns:v1="ns1" correlationId="coid" systemId="mybox" clientId="cid" mock="false" requestTime="2010-06-15T09:04:07.643-04:00">
       <v1:fooId>foo<v1:fooId>
    </v1:getFooDetailRequest>
  </env:Body>
</env:Envelope>

gets split on the "getFooDetailRequest" line into two entries.

I've tried writing my own sourcetype in a props.conf with

MORE_THAN_100 = ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?

and setting the files to that sourcetype manually, but I get the same result.

Does anyone know how to modify the built-in log4j sourcetype (since it is so close to being perfect), or have any other suggestions?

gkanapathy · ‎06-15-2010

You can override any Splunk default configurations by setting the corresponding setting name (under the same stanza header, in this case [log4j]), in the "local" directory. Basically, a setting in a "local" dir version of a file overrides the corresponding setting in the corresponding stanza in the "default" dir. Although the fact is, it's probably better to just define it over:

[log4j]
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

This is probably the easiest to understand, though for high-volume systems (> 100 GB/day), use:

[log4j]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

View solution in original post

gkanapathy · ‎06-15-2010

You can override any Splunk default configurations by setting the corresponding setting name (under the same stanza header, in this case [log4j]), in the "local" directory. Basically, a setting in a "local" dir version of a file overrides the corresponding setting in the corresponding stanza in the "default" dir. Although the fact is, it's probably better to just define it over:

[log4j]
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

This is probably the easiest to understand, though for high-volume systems (> 100 GB/day), use:

[log4j]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

altinp · ‎12-01-2015

Watch out, reader! In the second, very useful, snippet, the backslash has been lost and is needed in front of each 'r', 'n', and 'd':

LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})

Adam · ‎06-15-2010

That worked! I was a little concerned because I saw a few items that still weren't split properly, but after a couple of minutes all the kinks worked out and now even XML with 10 timestamps isn't split. Thanks!

Need to customize log4j sourcetype

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

GA: New Data Management App in Splunk Platform

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation