Getting Data In

Why is Splunk Truncating Multi-line Events?

sjnorman
Explorer

I'm indexing some Java application log files that use the log4j framework to output log messages. The log files are intermixed with CXF logging interceptor statements that log inbound/outbound SOAP messages that have the following format:

2014-07-16 10:25:13,812 INFO  WebContainer : 16 - Inbound Message
---------------------------- 
ID: 15231
Response-Code: 200
Encoding: UTF-8
Content-Type: text/xml;charset=UTF-8
Headers: {Content-Length=[5612], content-type=[text/xml;charset=UTF-8], Date=[Wed, 16 Jul 2014 15:25:13 GMT], Server=[Jetty(7.1.6.v20100715)]}
Payload: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Header></soap:Header><soap:Body><MyXmlMessage></MyXmlMessage></soap:Body></soap:Envelope>
----------------------------

I'd like to log these statements as single, multi-line events but Splunk seems to be randomly truncating the events after the following line: "Content-Type: text/xml;charset=UTF-8"

i.e. some events include the full context (including the payload), whereas others only include up to the content-type.

Here's what my props.conf looks like:

[default]
CHARSET = UTF-8
LINE_BREAKER_LOOKBEHIND = 100
TRUNCATE = 0
DATETIME_CONFIG = /etc/datetime.xml
ANNOTATE_PUNCT = True
HEADER_MODE =
MAX_DAYS_HENCE=2
MAX_DAYS_AGO=2000
MAX_DIFF_SECS_AGO=3600
MAX_DIFF_SECS_HENCE=604800
MAX_TIMESTAMP_LOOKAHEAD = 128
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = 
BREAK_ONLY_BEFORE_DATE = True
MAX_EVENTS = 20000 
MUST_BREAK_AFTER = 
MUST_NOT_BREAK_AFTER = 
MUST_NOT_BREAK_BEFORE = 


[log4j]
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 30
#BREAK_ONLY_BEFORE = \d\d?:\d\d:\d\d
BREAK_ONLY_BEFORE=^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3}
NO_BINARY_CHECK = true
pulldown_type = true 
maxDist = 75

Can anyone explain why Splunk would be truncating the events prematurely?

0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

The props seems correct, especially the BREAK_ONLY_BEFORE.

  • Try to add BREAK_ONLY_BEFORE_DATE = false
  • and make sure that the props.conf is deployed on the indexers and heavy forwarders (if any), because they are the instances parsing the events.

View solution in original post

sjnorman
Explorer

Here is my props.conf entry for log4j:

[log4j] 
TIME_FORMAT = %Y-%m-%d %H:%M:%S 
TIME_PREFIX = ^ 
MAX_TIMESTAMP_LOOKAHEAD = 25
BREAK_ONLY_BEFORE=^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3} 
NO_BINARY_CHECK = 1 
pulldown_type = 
true maxDist = 75
0 Karma

sjnorman
Explorer

FYI, I've checked the log files manually and there are no special characters that would be tripping up Splunk -- all lines end with a line feed character.

0 Karma

chanfoli
Builder

I think the timestamp in the payload line in combination with some of your other options is tripping it up.

I made a small sample file and got proper breaking with something as simple as this for the sourcetype:

# chanfoli's settings
MAX_TIMESTAMP_LOOKAHEAD=25
NO_BINARY_CHECK=1

sjnorman
Explorer

FYI, I manually checked the log files and the lines all end with line feeds...

0 Karma

sjnorman
Explorer

I applied the changes and still suffer from the same problem...somewhere between 25% and 50% of the events for the CXF log statements are being cut off after the "Content-Type: text/xml; charset=UTF-8" line. I really don't know what's tripping it up there.

0 Karma

yannK
Splunk Employee
Splunk Employee

The props seems correct, especially the BREAK_ONLY_BEFORE.

  • Try to add BREAK_ONLY_BEFORE_DATE = false
  • and make sure that the props.conf is deployed on the indexers and heavy forwarders (if any), because they are the instances parsing the events.

sjnorman
Explorer

Doh! I was applying the configuration to the forwarders. I applied the update to the indexer and it seems to be working now, thanks!

0 Karma

yannK
Splunk Employee
Splunk Employee

The parsing is no happening at the universal/lightweight forwarder level, so it should not make a difference.

0 Karma

sjnorman
Explorer

Thanks for the suggestion, but it didn't seem to have any effect -- the behaviour is still the same.

FYI, yes I've made the changes to props.conf on my universal forwarders and re-started them afterwards.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...