Splunk Search

Trouble reading log lines with large JSON or multiline Java exceptions from slf4j

Explorer

My question is similar to others around extracting new fields, but the answers I've tried to date haven't worked.

When I click on Extract New Fields, the Select Sample Event screen will end up selecting somewhere around 20 actual log lines. It will read them as a single sample event instead of around 20 separate events (one per log line).

Included in the log message at the end of the log line are sometimes very large JSON strings or typical multi-line Java exceptions.

The log pattern is as follows:

time:stamp LOGTYPE  [java-thread-id-1234][JavaClass:LineNumber] Log message goes here. Usually is a short message. Sometimes includes *very* large single-line JSON strings. Sometimes includes a multi-line Java exception.

A practical example would be as follows:

08:33:09,372 INFO  [http-bio-8080-exec-4687][ServicesController:125] JSON returned={"succeeded":true,"data":{"example1":[],"example2":"","example3":null},"message":""}

Or:

09:47:13,215 INFO  [http-bio-8080-exec-4678][ServicesController:125] Example log message goes here.

When I setup the forwarder, the Source Type was set to Automatic, not log4j. We're using slf4j for our logger. Does Splunk understand slf4j? I'm assuming it does, but if it doesn't, do I need to find an app that will add support for slf4j?

Bottom line, is it possible to extract these fields including the large JSON strings and multi-line Java exceptions?

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Try props.conf settings something like this:

[your_sourcetype]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_FORMAT = %H:%M:%S,%3N
TRUNCATE = set this large enough to fit your biggest events in characters plus safety margin
MAX_EVENTS = set this large enough to fit your biggest events in lines plus safety margin
EXTRACT-slkf4j = (?s)^\S++\s++(?<log_level>[A-Z]++)[^\[]*+\[\s*+(?<thread_id>[^\]\s]++)\s*+\]\[\s*+(?<java_class>[^:]++):(?<line_number>\d++)\s*+\]\s*+(?<message>.*+)
EXTRACT-json_message = (?s)JSON\s*+returned=(?<json_message>.*+)

This should get your timestamping and event breaking in order, as well as basic field extractions. The JSON part is a bit more tricky, I think Splunk doesn't like partial-JSON-events for INDEXED_EXTRACTIONS = json... if that's true, you can always do base search | spath input=json_message

View solution in original post

SplunkTrust
SplunkTrust

Try props.conf settings something like this:

[your_sourcetype]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_FORMAT = %H:%M:%S,%3N
TRUNCATE = set this large enough to fit your biggest events in characters plus safety margin
MAX_EVENTS = set this large enough to fit your biggest events in lines plus safety margin
EXTRACT-slkf4j = (?s)^\S++\s++(?<log_level>[A-Z]++)[^\[]*+\[\s*+(?<thread_id>[^\]\s]++)\s*+\]\[\s*+(?<java_class>[^:]++):(?<line_number>\d++)\s*+\]\s*+(?<message>.*+)
EXTRACT-json_message = (?s)JSON\s*+returned=(?<json_message>.*+)

This should get your timestamping and event breaking in order, as well as basic field extractions. The JSON part is a bit more tricky, I think Splunk doesn't like partial-JSON-events for INDEXED_EXTRACTIONS = json... if that's true, you can always do base search | spath input=json_message

View solution in original post

Explorer

I have created the new source type in my splunk/etc/system/local/props.conf file, applied it to my data input and restarted the Splunk service. Do I need to do anything to the already indexed data so it uses the new source type?

0 Karma

Explorer

After clearing out my affected indexes and re-importing my log data, I was able to cleanly extract all my target fields. Thanks!