Ok, I'm at my wits' end here. I have an application log which produces events of the format:
DEBUG | 2012-02-16 11:01:30,683 [http-10.0.0.1-8443-Processor6] SystemFile - field1=value1 timestamp=2012-02-16 11:01:30.679 CST field2=value2 field3=value3 field4=value4 field5= field6=value6 field7=A field value with spaces in it field8=
DEBUG | 2012-02-16 11:01:32,457 [http-10.0.0.1-8443-Processor10] SystemFile - field1=value1 timestamp=2012-02-16 11:01:32,450 CST field2=value2 field3= field4=value4 field5= field6=value6 field7=Another field with spaces in it field8=value8
Basically tab-delimited name/value pairs, with nice neat newlines at the end of the lines (I've verified the line breaks and tabs in a hex editor, and all events are being written via the same log4j config). I -thought- I had it all being parsed just fine, but it appears that the index-time parsing is not always splitting the events on newlines, and I'll end up with two (or three, or four, or five) log lines in one event. They have different timestamps, so it's not that it's rolling them up into one (the above two events are a sanitzed example of two that got rolled together). I would suspect it's that the first one ends with an equals sign (no value), but there are plenty of events in the same log that look identical that get split properly. I'm stumped.
My props.conf for the log source looks like:
[MySourceType]
LINE_BREAKER = ([\r\n]+)
REPORT-tab-kv-manual = tab-kv-manual
KV_MODE = NONE
TIME_PREFIX = DEBUG
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
MAX_TIMESTAMP_LOOKAHEAD = 30
And my transforms.conf looks like:
[tab-kv-manual]
REGEX = (\t|- )([^=]+)=([^\t\n]*)
FORMAT = $2::$3
REPEAT_MATCH = true
Any suggestions?
Did you ever figure this out? Having the same issue. Testing the explicit line breaker currently.
What is your data format? Also, include "SHOULD_LINEMERGE=false" in props.conf along with LINE_BREAKER.
I've been there as well, and while it looks like your LINE_BREAKER regex is correct, I think I remember that being a bit more explicit solved the issue:
LINE_BREAKER = ([\r\n]+)[A-Z]+\s+\|\s+\d+
Also, your TIME_PREFIX is just wrong, it should be:
TIME_PREFIX = ^[A-Z]+\s+\|\s+
Hope this helps,
Kristian