Getting Data In

Help parsing and breaking up multi-line event in props.conf

a212830
Champion

Hi,

I'm having problems parsing the following lines, hoping someone can help me.

Here's my props:

ANNOTATE_PUNCT = false
KV_MODE = auto
LINE_BREAKER = ([\r\n]+)WARN|INFO|ERROR|DEBUG\d{4}-\d{2}-\d{2}
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^WARN| |^INFO |^ERROR 
TRUNCATE = 999999

Sample:

WARN | 2014-06-19 20:37:30,275 | localhost-startStop-1 | TypeConverterDelegate.java | 263 | PropertyEditor [com.sun.beans.editors.EnumEditor] found through deprecated global PropertyEditorManager fallback - consider using a more isolated form of registration, e.g. on the BeanWrapper/BeanFactory!
 WARN | 2014-06-19 20:37:30,285 | localhost-startStop-1 | TypeConverterDelegate.java | 263 | PropertyEditor [com.sun.beans.editors.EnumEditor] found through deprecated global PropertyEditorManager fallback - consider using a more isolated form of registration, e.g. on the BeanWrapper/BeanFactory!
 WARN | 2014-06-19 20:37:30,293 | localhost-startStop-1 | TypeConverterDelegate.java | 263 | PropertyEditor [com.sun.beans.editors.EnumEditor] found through deprecated global PropertyEditorManager fallback - consider using a more isolated form of registration, e.g. on the BeanWrapper/BeanFactory!
 WARN | 2014-06-19 20:37:30,300 | localhost-startStop-1 | TypeConverterDelegate.java | 263 | PropertyEditor [com.sun.beans.editors.EnumEditor] found through deprecated global PropertyEditorManager fallback - consider using a more isolated form of registration, e.g. on the BeanWrapper/BeanFactory!
Tags (2)
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Well, if you're getting one big event then my note number five is your biggest issue. Your LINE_BREAKER = ([\r\n]+)WARN|INFO|ERROR|DEBUG\d{4}-\d{2}-\d{2} doesn't match the sample data. Try this expression based on note number two instead if you don't want to rely on "break on timestamp":

LINE_BREAKER = ([\r\n]+)\s*[A-Z]+\s*\|\s*\d{4}-\d{2}-\d{2}

That allows for any number of spaces before and after the log level, any capital-letter log level, as well as the pipe symbol and spaces before the date.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Well, if you're getting one big event then my note number five is your biggest issue. Your LINE_BREAKER = ([\r\n]+)WARN|INFO|ERROR|DEBUG\d{4}-\d{2}-\d{2} doesn't match the sample data. Try this expression based on note number two instead if you don't want to rely on "break on timestamp":

LINE_BREAKER = ([\r\n]+)\s*[A-Z]+\s*\|\s*\d{4}-\d{2}-\d{2}

That allows for any number of spaces before and after the log level, any capital-letter log level, as well as the pipe symbol and spaces before the date.

a212830
Champion

Worked. Thanks!

0 Karma

a212830
Champion

The problem is the line-breaking - sorry, forgot to include that. I'm getting one big event.

0 Karma

a212830
Champion

Thanks for the response. Yes, some (not all) of the lines appear to have a space at the beginning. There are multi-lines in the file, I can include them if needed. They aren't huge - 2 or 3 lines. As for using the default settings, we have a policy where that's not allowed. Prof services doesn't recommend it, and if something changes in the log, we can't really go back and tell what the settings were beforehand. My understanding is that it also puts extra burden on the indexer.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Fifth, your LINE_BREAKER regex doesn't allow for any spaces or pipes between the loglevel and the date. There should be parentheses around the loglevel list as well because the pipe symbol (OR) has less strong binding than character concatenation (AND).

martin_mueller
SplunkTrust
SplunkTrust

How are your problems manifesting themselves?

Some stuff I noticed:
First, it appears some of your logs have a space in front of the loglevel. Is that actually the case or just a copy&paste error?
Second, the TIME_PREFIX regex is "starts with warn", "space", "starts with info", or "starts with error" - is that intentional? I'd go with ^\s*[A-Z]+\s*\|\s* instead to be robust against small differences.
Third, your sample data doesn't appear to have any multi-line events but you mentioned those in the title?
Fourth, throwing your sample data into Splunk with all default settings looks okay.

Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...