Hi,
I'm having problems parsing the following lines, hoping someone can help me.
Here's my props:
ANNOTATE_PUNCT = false
KV_MODE = auto
LINE_BREAKER = ([\r\n]+)WARN|INFO|ERROR|DEBUG\d{4}-\d{2}-\d{2}
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^WARN| |^INFO |^ERROR
TRUNCATE = 999999
Sample:
WARN | 2014-06-19 20:37:30,275 | localhost-startStop-1 | TypeConverterDelegate.java | 263 | PropertyEditor [com.sun.beans.editors.EnumEditor] found through deprecated global PropertyEditorManager fallback - consider using a more isolated form of registration, e.g. on the BeanWrapper/BeanFactory!
WARN | 2014-06-19 20:37:30,285 | localhost-startStop-1 | TypeConverterDelegate.java | 263 | PropertyEditor [com.sun.beans.editors.EnumEditor] found through deprecated global PropertyEditorManager fallback - consider using a more isolated form of registration, e.g. on the BeanWrapper/BeanFactory!
WARN | 2014-06-19 20:37:30,293 | localhost-startStop-1 | TypeConverterDelegate.java | 263 | PropertyEditor [com.sun.beans.editors.EnumEditor] found through deprecated global PropertyEditorManager fallback - consider using a more isolated form of registration, e.g. on the BeanWrapper/BeanFactory!
WARN | 2014-06-19 20:37:30,300 | localhost-startStop-1 | TypeConverterDelegate.java | 263 | PropertyEditor [com.sun.beans.editors.EnumEditor] found through deprecated global PropertyEditorManager fallback - consider using a more isolated form of registration, e.g. on the BeanWrapper/BeanFactory!
Well, if you're getting one big event then my note number five is your biggest issue. Your LINE_BREAKER = ([\r\n]+)WARN|INFO|ERROR|DEBUG\d{4}-\d{2}-\d{2}
doesn't match the sample data. Try this expression based on note number two instead if you don't want to rely on "break on timestamp":
LINE_BREAKER = ([\r\n]+)\s*[A-Z]+\s*\|\s*\d{4}-\d{2}-\d{2}
That allows for any number of spaces before and after the log level, any capital-letter log level, as well as the pipe symbol and spaces before the date.
Well, if you're getting one big event then my note number five is your biggest issue. Your LINE_BREAKER = ([\r\n]+)WARN|INFO|ERROR|DEBUG\d{4}-\d{2}-\d{2}
doesn't match the sample data. Try this expression based on note number two instead if you don't want to rely on "break on timestamp":
LINE_BREAKER = ([\r\n]+)\s*[A-Z]+\s*\|\s*\d{4}-\d{2}-\d{2}
That allows for any number of spaces before and after the log level, any capital-letter log level, as well as the pipe symbol and spaces before the date.
Worked. Thanks!
The problem is the line-breaking - sorry, forgot to include that. I'm getting one big event.
Thanks for the response. Yes, some (not all) of the lines appear to have a space at the beginning. There are multi-lines in the file, I can include them if needed. They aren't huge - 2 or 3 lines. As for using the default settings, we have a policy where that's not allowed. Prof services doesn't recommend it, and if something changes in the log, we can't really go back and tell what the settings were beforehand. My understanding is that it also puts extra burden on the indexer.
Fifth, your LINE_BREAKER
regex doesn't allow for any spaces or pipes between the loglevel and the date. There should be parentheses around the loglevel list as well because the pipe symbol (OR) has less strong binding than character concatenation (AND).
How are your problems manifesting themselves?
Some stuff I noticed:
First, it appears some of your logs have a space in front of the loglevel. Is that actually the case or just a copy&paste error?
Second, the TIME_PREFIX
regex is "starts with warn", "space", "starts with info", or "starts with error" - is that intentional? I'd go with ^\s*[A-Z]+\s*\|\s*
instead to be robust against small differences.
Third, your sample data doesn't appear to have any multi-line events but you mentioned those in the title?
Fourth, throwing your sample data into Splunk with all default settings looks okay.