Splunk Search

Trouble reading log lines with large JSON or multiline Java exceptions from slf4j

mriley_cpmi
Explorer

My question is similar to others around extracting new fields, but the answers I've tried to date haven't worked.

When I click on Extract New Fields, the Select Sample Event screen will end up selecting somewhere around 20 actual log lines. It will read them as a single sample event instead of around 20 separate events (one per log line).

Included in the log message at the end of the log line are sometimes very large JSON strings or typical multi-line Java exceptions.

The log pattern is as follows:

time:stamp LOGTYPE  [java-thread-id-1234][JavaClass:LineNumber] Log message goes here. Usually is a short message. Sometimes includes *very* large single-line JSON strings. Sometimes includes a multi-line Java exception.

A practical example would be as follows:

08:33:09,372 INFO  [http-bio-8080-exec-4687][ServicesController:125] JSON returned={"succeeded":true,"data":{"example1":[],"example2":"","example3":null},"message":""}

Or:

09:47:13,215 INFO  [http-bio-8080-exec-4678][ServicesController:125] Example log message goes here.

When I setup the forwarder, the Source Type was set to Automatic, not log4j. We're using slf4j for our logger. Does Splunk understand slf4j? I'm assuming it does, but if it doesn't, do I need to find an app that will add support for slf4j?

Bottom line, is it possible to extract these fields including the large JSON strings and multi-line Java exceptions?

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Try props.conf settings something like this:

[your_sourcetype]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_FORMAT = %H:%M:%S,%3N
TRUNCATE = set this large enough to fit your biggest events in characters plus safety margin
MAX_EVENTS = set this large enough to fit your biggest events in lines plus safety margin
EXTRACT-slkf4j = (?s)^\S++\s++(?<log_level>[A-Z]++)[^\[]*+\[\s*+(?<thread_id>[^\]\s]++)\s*+\]\[\s*+(?<java_class>[^:]++):(?<line_number>\d++)\s*+\]\s*+(?<message>.*+)
EXTRACT-json_message = (?s)JSON\s*+returned=(?<json_message>.*+)

This should get your timestamping and event breaking in order, as well as basic field extractions. The JSON part is a bit more tricky, I think Splunk doesn't like partial-JSON-events for INDEXED_EXTRACTIONS = json... if that's true, you can always do base search | spath input=json_message

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Try props.conf settings something like this:

[your_sourcetype]
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_FORMAT = %H:%M:%S,%3N
TRUNCATE = set this large enough to fit your biggest events in characters plus safety margin
MAX_EVENTS = set this large enough to fit your biggest events in lines plus safety margin
EXTRACT-slkf4j = (?s)^\S++\s++(?<log_level>[A-Z]++)[^\[]*+\[\s*+(?<thread_id>[^\]\s]++)\s*+\]\[\s*+(?<java_class>[^:]++):(?<line_number>\d++)\s*+\]\s*+(?<message>.*+)
EXTRACT-json_message = (?s)JSON\s*+returned=(?<json_message>.*+)

This should get your timestamping and event breaking in order, as well as basic field extractions. The JSON part is a bit more tricky, I think Splunk doesn't like partial-JSON-events for INDEXED_EXTRACTIONS = json... if that's true, you can always do base search | spath input=json_message

mriley_cpmi
Explorer

I have created the new source type in my splunk/etc/system/local/props.conf file, applied it to my data input and restarted the Splunk service. Do I need to do anything to the already indexed data so it uses the new source type?

0 Karma

mriley_cpmi
Explorer

After clearing out my affected indexes and re-importing my log data, I was able to cleanly extract all my target fields. Thanks!

Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...