Original log:
[{"username": "xxx", "event": "session_start", "event_category": "session", "timestamp": "2019-12-11 08:26:23.547000+00:00", "context_ip": "xxx", "context_page_referrer": "xxx", "context_page_url": "xxx", "context_page_search": null, "context_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "context_data": null, "response": null}, {"username": "xxx", "event": "session_start", "event_category": "session", "timestamp": "2019-12-11 12:53:32.350000+00:00", "context_ip": "xxx", "context_page_referrer": null, "context_page_url": "xxx", "context_page_search": null, "context_user_agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "context_data": null, "response": null}]
Expected logs:
{"username": "xxx", "event": "session_start", "event_category": "session", "timestamp": "2019-12-11 08:26:23.547000+00:00", "context_ip": "xxx", "context_page_referrer": "xxx", "context_page_url": "xxx", "context_page_search": null, "context_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "context_data": null, "response": null}
{"username": "xxx", "event": "session_start", "event_category": "session", "timestamp": "2019-12-11 12:53:32.350000+00:00", "context_ip": "xxx", "context_page_referrer": null, "context_page_url": "xxx", "context_page_search": null, "context_user_agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "context_data": null, "response": null}
Currently my used props.conf is:
[xxx]
SHOULD_LINEMERGE=true
NO_BINARY_CHECK=true
CHARSET=UTF-8
SEDCMD-remove_prefix=s/\[//g
SEDCMD-remove_suffix=s/\]//g
SEDCMD-removeeventcommas=s/}, {"username":/}{"username":/g
BREAK_ONLY_BEFORE=\{\"username\" <-- This one is not working
Output I am getting using above props.conf"
{"username": "xxx", "event": "session_start", "event_category": "session", "timestamp": "2019-12-11 08:26:23.547000+00:00", "context_ip": "xxx", "context_page_referrer": "xxx", "context_page_url": "xxx", "context_page_search": null, "context_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "context_data": null, "response": null}{"username": "xxx", "event": "session_start", "event_category": "session", "timestamp": "2019-12-11 12:53:32.350000+00:00", "context_ip": "xxx", "context_page_referrer": null, "context_page_url": "xxx", "context_page_search": null, "context_user_agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "context_data": null, "response": null}
I am doing these validation while uploading sample log file from WebUI and during 2nd configuration page of Add Data I am doing this testing.
What I am missing?
Use ONLY this (do not add any of the stuff that I dropped back in):
[xxx]
SHOULD_LINEMERGE=false
LINE_BREAKER = ((?:(?:^|\][\r\n]+)\[)|,\s+)\{"username"
NO_BINARY_CHECK=true
CHARSET=UTF-8
SEDCMD-remove_suffix=s/]//g
Never, EVER use SHOULD_LINEMERGE = true
and the BREAK_*
junk. I have only ever seen 1 time where it was necessary.
I'd recommend using explicit LINE_BREAKER
and SHOULD_LINEMERGE=false
. That is much more predictable and is also more performant.
Something like this should work for your data:
LINE_BREAKER = ([\r\n]*\[|,\s+)\{"username":
SHOULD_LINEMERGE=false
This also automatically takes care of stripping the leading [
or ,
in between records. Only SEDCMD needed is stripping of the trailing ]
. See: https://regex101.com/r/8zGyMS/1
Note: SEDCMD applies after line breaking.