Splunk Search

Regex not matching in multiline events with XML

Communicator

Hi all,

I have a log format with plain text followed by XML payload spread over multiple lines.

CREATION_TS=15-11-13 09:00:05| SomeText <?xml version="1.0" encoding="UTF-8"?>
<newline>
<response id="100008" sub-id="0">
    <payload>
    ....
     500 lines later
    </payload>
</response>

The timestamp 15-11-13 09:00:05 is not matching our policy, so I want to rewrite this to ISO format.

props.conf

[MySourceType]
BREAK_ONLY_BEFORE=CREATION_TS
MAX_EVENTS=100000 
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true 
TRUNCATE=0
TRANSFORMS-ER-Translog-Log = rewrite_translog_log

transforms.conf

[rewrite_translog_log]
REGEX = (?m)^CREATION_TS=(\d\d)-(\d\d)-(\d\d)\s(\d\d):(\d\d):(\d\d)\|(.*)
FORMAT = DT="$1.$2.20$3T$4:$5:$6" $7
DEST_KEY = _raw

Everything is working as expected, except that the event is truncated after the string <?xml version="1.0" encoding="UTF-8"?>. So my resulting events are looking like:

DT="15.11.2013T20:10:52" SomeText <?xml version="1.0" encoding="UTF-8"?>

Everything after the first new line in the event is discarded silently. I have used the (?m) to enable multi-line rexex but obviously this is not yet working.

Any ideas how I can save the remaining lines of the multi-line events after the rexec processing?

Thanks
Norbert

Tags (3)
0 Karma

Communicator

OK, it seems that the regex modifier at the beginning of the lines needs to be (?s) instead of (?m). So the new stanza in transforms.conf is:

[rewrite_translog_log]
REGEX = (?s)^CREATION_TS=(\d\d)-(\d\d)-(\d\d)\s(\d\d):(\d\d):(\d\d)\|(.*)
FORMAT = DT="$1.$2.20$3T$4:$5:$6" $7
DEST_KEY = _raw
0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!