Getting Data In

Need to customize log4j sourcetype

Adam
Explorer

The logs I'm trying to index are in a log4j style, and entries such as

2010-06-15 09:04:08,204 [[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel
.Default (self-tuning)'][intOrdId=17746,intVrsn=18846] DEBUG com.att.canopi.idis
.ordermanagement.sms.cramer.adapter.util.ReflectionsTools.ipagi1a1.snt.bst.bls.c
om - Set value1:Customer100 field name1:customerName

are properly split into unique entries, however some entries are multi-line, and have embedded XML, and these are split wherever a date (with a different format than the date at the start of an entry) is found, so the log entry

2010-06-15 09:04:07,686 [[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel
.Default (self-tuning)'][intOrdId=17746,intVrsn=18846] INFO  fooHandler - <env:Envelope xmlns:env="
http://schemas.xmlsoap.org/soap/envelope/">
  <env:Header/>
  <env:Body>
    <v1:getFooDetailRequest xmlns:v1="ns1" correlationId="coid" systemId="mybox" clientId="cid" mock="false" requestTime="2010-06-15T09:04:07.643-04:00">
       <v1:fooId>foo<v1:fooId>
    </v1:getFooDetailRequest>
  </env:Body>
</env:Envelope>

gets split on the "getFooDetailRequest" line into two entries.

I've tried writing my own sourcetype in a props.conf with

MORE_THAN_100 = ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?

and setting the files to that sourcetype manually, but I get the same result.

Does anyone know how to modify the built-in log4j sourcetype (since it is so close to being perfect), or have any other suggestions?

Tags (1)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

You can override any Splunk default configurations by setting the corresponding setting name (under the same stanza header, in this case [log4j]), in the "local" directory. Basically, a setting in a "local" dir version of a file overrides the corresponding setting in the corresponding stanza in the "default" dir. Although the fact is, it's probably better to just define it over:

[log4j]
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

This is probably the easiest to understand, though for high-volume systems (> 100 GB/day), use:

[log4j]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

You can override any Splunk default configurations by setting the corresponding setting name (under the same stanza header, in this case [log4j]), in the "local" directory. Basically, a setting in a "local" dir version of a file overrides the corresponding setting in the corresponding stanza in the "default" dir. Although the fact is, it's probably better to just define it over:

[log4j]
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

This is probably the easiest to understand, though for high-volume systems (> 100 GB/day), use:

[log4j]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

altinp
Explorer

Watch out, reader! In the second, very useful, snippet, the backslash has been lost and is needed in front of each 'r', 'n', and 'd':

LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})
0 Karma

Adam
Explorer

That worked! I was a little concerned because I saw a few items that still weren't split properly, but after a couple of minutes all the kinks worked out and now even XML with 10 timestamps isn't split. Thanks!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

In today’s data-heavy environment, organizations are caught in a data distribution dilemma. As data volumes ...

GA: New Data Management App in Splunk Platform

Streamlining Data Management: Introducing a unified experience in Splunk Managing data at scale shouldn’t feel ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...