Dashboards & Visualizations

log4j with occasional XML content/timestamp confusion


Hi All,

I have a log file which is coming out of a JavaEE (JBoss) application via log4j. Occasionally ASCII serialized XML objects are dropped in the stream for display. Some of these objects have additional timestamp fields in them. Here's a snip of one of these XML objects... I've prepended line numbers which aren't in the source file. (Hmmm... can't seem to convince markdown to ignore the XML tags and give me unformatted source. Switched to adding a pastebin link for unmunged source)

link to log source

  1. 2012-10-05 09:25:42,953 INFO au.com.necdl.pexa.broker.messaging.MessageRouterService Generated message [<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
  2. <SOAP-ENV:Header>
  3. <MessageHeader/>
  4. </SOAP-ENV:Header>
  5. <SOAP-ENV:Body>
  6. <RegistryInformationSupplyResponse>
  7. <MessageHeader>
  8. <MessageCategory>Acknowledgement</MessageCategory&
  9. <MessageService>Registry Information Supply</MessageService&
  10. <MessageId>MessageId</MessageId&
  11. <TransmissionTimestamp>2001-12-31T12:00:00</TransmissionTimestamp&
  12. <MessageOrigin>MessageOrigin</MessageOrigin&
  13. <MessageDestination>MessageDestination</MessageDestination&
  14. <Environment>Development</Environment&
  15. <OriginalMessageId>OriginalMessageId</OriginalMessageId&
  16. <OriginalMessageStatus>
  17. <MessageStatusCode>0 OK</MessageStatusCode&
  18. </OriginalMessageStatus>
  19. </MessageHeader>
  20. <Jurisdiction>ACT</Jurisdiction&
  21. <RegistryInformation>
  22. <IssueTimestamp>2001-12-31T12:00:00</IssueTimestamp&
  23. <LandTitleReference>
  24. <Component>

So, when we display the logged events we were having lots of trouble with event boundaries. Events truncated on the each of the XML internal timestamps, e.g. the TransmissionTimestamp of line 11 above. Using show source in the UI showed the rest of the XML object was completely missing.

I've changed props.conf for the log4j sourcetype to specifically change the LINE_BREAKER as follows:

  LINE_BREAKER = ([\r\n]+)\d\d\d\d-\d+-\d+ \d+:\d+:\d+,\d+

which now has all the XML lines in the 'Show Source' display but still seems to truncate the event after only 10 lines.

So now I'm not clear if this is a 10 line limit of some sort or if the timestamp in the XML is somehow still being picked up as an event boundary. Another clue may be that in spite of all the lines being present in 'show source' there's no option in the UI to see supressed lines or indeed subsequent events with a chunk more of the XML.

Any ideas guys?

0 Karma

Re: log4j with occasional XML content/timestamp confusion


Hi Enno, sounds like you want to create one splunk event corresponding to one log4j print statement. The following works especially well for me:


While I generally add an extra timestamp prefix [ to help Splunk distinguish, the above should work well enough. The trick is to limit where the timestamp can be found. This props configuration will basically indicate, always merge, until you find \d{4}, max number of events is 50,000, and the timestamp is located within 23 characters.

View solution in original post