Hi all,
I have a log format with plain text followed by XML payload spread over multiple lines.
CREATION_TS=15-11-13 09:00:05| SomeText <?xml version="1.0" encoding="UTF-8"?>
<newline>
<response id="100008" sub-id="0">
<payload>
....
500 lines later
</payload>
</response>
The timestamp 15-11-13 09:00:05 is not matching our policy, so I want to rewrite this to ISO format.
props.conf
[MySourceType]
BREAK_ONLY_BEFORE=CREATION_TS
MAX_EVENTS=100000
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TRUNCATE=0
TRANSFORMS-ER-Translog-Log = rewrite_translog_log
transforms.conf
[rewrite_translog_log]
REGEX = (?m)^CREATION_TS=(\d\d)-(\d\d)-(\d\d)\s(\d\d):(\d\d):(\d\d)\|(.*)
FORMAT = DT="$1.$2.20$3T$4:$5:$6" $7
DEST_KEY = _raw
Everything is working as expected, except that the event is truncated after the string <?xml version="1.0" encoding="UTF-8"?>. So my resulting events are looking like:
DT="15.11.2013T20:10:52" SomeText <?xml version="1.0" encoding="UTF-8"?>
Everything after the first new line in the event is discarded silently. I have used the (?m) to enable multi-line rexex but obviously this is not yet working.
Any ideas how I can save the remaining lines of the multi-line events after the rexec processing?
Thanks
Norbert
OK, it seems that the regex modifier at the beginning of the lines needs to be (?s) instead of (?m). So the new stanza in transforms.conf is:
[rewrite_translog_log]
REGEX = (?s)^CREATION_TS=(\d\d)-(\d\d)-(\d\d)\s(\d\d):(\d\d):(\d\d)\|(.*)
FORMAT = DT="$1.$2.20$3T$4:$5:$6" $7
DEST_KEY = _raw