Getting Data In

Log file with differing message formats

mikelanghorst
Motivator

I've run across an odd log file from EMC's Data Protection application that is logging two very different log formats into a single file. Example:

2012-03-08 12:06:30,643 INFO Webapp Launcher [Init] Connection to controller at fdpap01.oa.domain.com:3916
2012-03-08 12:06:30,643 INFO Webapp Launcher [Init] Connection to reporter at fdpap01.oa.domain.com:4002
INFO 2560.2564 20120308:123239 service - ServerCtrlHandler(): Service stop signalled - exiting
INFO 2676.2696 20120308:123532 webapp - daemonMain(): Setting memory limit '-Xmx128m'
INFO 2676.2696 20120308:123535 webapp - daemonMain(): DPA Webapp
INFO 2676.2696 20120308:123535 webapp - daemonMain(): (c) 1994-2009 EMC Corporation. All rights reserved.
INFO 2676.2696 20120308:123535 webapp - daemonMain(): Version: 5.0.1 build 4792 on windows
INFO 2676.2696 20120308:123535 webapp - daemonMain(): Logging at level Info
2012-03-08 12:36:01,967 INFO Webapp Launcher [Init] Connection to controller at fdpap01.oa.domain.com:3916
2012-03-08 12:36:01,967 INFO Webapp Launcher [Init] Connection to reporter at fdpap01.oa.domain.com:4002
INFO 2676.2680 20120308:133056 service - ServerCtrlHandler(): Service stop signalled - exiting
INFO 3912.3884 20120308:133135 webapp - daemonMain(): Setting memory limit '-Xmx128m'
INFO 3912.3884 20120308:133135 webapp - daemonMain(): DPA Webapp
INFO 3912.3884 20120308:133135 webapp - daemonMain(): (c) 1994-2009 EMC Corporation. All rights reserved.
INFO 3912.3884 20120308:133135 webapp - daemonMain(): Version: 5.0.1 build 4792 on windows
INFO 3912.3884 20120308:133135 webapp - daemonMain(): Logging at level Info
2012-03-08 13:31:38,752 INFO Webapp Launcher [Init] Connection to controller at fdpap01.oa.domain.com:3916
2012-03-08 13:31:38,752 INFO Webapp Launcher [Init] Connection to reporter at fdpap01.oa.domain.com:4002

Whenever I've had to assist splunk with line breaking & date extraction, it's been a consistent format for the entire file. Either specified a source or sourcetype, and the specifics to break on. Unsure how to handle this one in regards to date extraction. For the lines starting with the severity, the third column is the datestamp, and does line up that each of these should be a different event. Currently by default Splunk is merging these.

Ideas?

Tags (1)

hexx
Splunk Employee
Splunk Employee

If you can be sure that you'll always have a 1 line = 1 event parity for this data source, the simple way to fix the line-breaking is simply to set :

SHOULD_LINEMERGE = false

The different time formats might cause a different kind of problem, as Splunk's time stamp extraction heuristic are not fond of this situation.

Still, it might be worth it to see how the time stamp extraction behaves once you've fixed the line-breaking. Perhaps you should still add, at a minimum :

MAX_TIMESTAMP_LOOKAHEAD = 37

...in order to scope the time stamp extraction as much as we currently can.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

How to find the worst searches in your Splunk environment and how to fix them

Everyone knows Splunk is a powerful platform for running searches and doing data analytics. Your ...

Share Your Feedback: On Admin Config Service (ACS)!

Help Us Build a Better Admin Config Service Experience (ACS)   We Want Your Feedback on Admin Config Service ...

Build the Future of Agentic AI: Join the Splunk Agentic Ops Hackathon

AI is changing how teams investigate incidents, detect threats, automate workflows, and build intelligent ...