Hello, I would like to properly parse rspamd logs that look like this (2 lines sample): 2023-11-12 16:06:22 #28191(rspamd_proxy) <8eca26>; proxy; rspamd_task_write_log: action="no action", diges...
See more...
Hello, I would like to properly parse rspamd logs that look like this (2 lines sample): 2023-11-12 16:06:22 #28191(rspamd_proxy) <8eca26>; proxy; rspamd_task_write_log: action="no action", digest="107a69c58d90a38bb0214546cbe78b52", dns_req="79", filename="undef", forced_action="undef", ip="A.B.C.D", is_spam="F", len="98123", subject="foobar", head_from=""dude" <info@example.com>", head_to=""other" <other@example.net>", head_date="Sun, 12 Nov 2023 07:08:57 +0000", head_ua="nil", mid="<B3.B6.48980.82A70556@gg.mta2vrest.cc.prd.sparkpost>", qid="0E4231619A", scores="-0.71/15.00", settings_id="undef", symbols_scores_params="BAYES_HAM(-3.00){100.00%;},RBL_MAILSPIKE_VERYBAD(1.50){A.B.C.D:from;},RWL_AMI_LASTHOP(-1.00){A.B.C.D:from;},URI_COUNT_ODD(1.00){105;},FORGED_SENDER(0.30){info@example.com;bounce-1699772934217.160136898825325270136859@example.com;},MANY_INVISIBLE_PARTS(0.30){4;},ZERO_FONT(0.20){2;},BAD_REP_POLICIES(0.10){},MIME_GOOD(-0.10){multipart/alternative;text/plain;},HAS_LIST_UNSUB(-0.01){},ARC_NA(0.00){},ASN(0.00){asn:23528, ipnet:A.B.C.0/20, country:US;},DKIM_TRACE(0.00){example.com:+;},DMARC_POLICY_ALLOW(0.00){example.com;none;},FROM_HAS_DN(0.00){},FROM_NEQ_ENVFROM(0.00){info@example.com;bounce-1699772934217.160136898825325270136859@example.com;},HAS_REPLYTO(0.00){support@example.com;},MIME_TRACE(0.00){0:+;1:+;2:~;},RCPT_COUNT_ONE(0.00){1;},RCVD_COUNT_ZERO(0.00){0;},REDIRECTOR_URL(0.00){twitter.com;},REPLYTO_DOM_NEQ_FROM_DOM(0.00){},R_DKIM_ALLOW(0.00){example.com:s=scph0618;},R_SPF_ALLOW(0.00){+exists:A.B.C.D._spf.sparkpostmail.com;},TO_DN_ALL(0.00){},TO_MATCH_ENVRCPT_ALL(0.00){}", time_real="1605.197ms", user="undef"
2023-11-12 16:02:04 #28191(rspamd_proxy) <4a3599>; proxy; rspamd_task_write_log: action="no action", digest="5151f8aa4eaebc5877c7308fed4ea21e", dns_req="19", filename="undef", forced_action="undef", ip="E.F.G.H", is_spam="F", len="109529", subject="Re: barfoo", head_from="other me <other@example.net>", head_to="someone <someone@exmaple.fr>", head_date="Sun, 12 Nov 2023 16:02:03 +0100", head_ua="Apple Mail (2.3731.700.6)", mid="<3425840B-B955-4647-AB4D-163FC54BE820@example.net>", qid="163A215DB3", scores="-4.09/15.00", settings_id="undef", symbols_scores_params="BAYES_HAM(-2.99){99.99%;},ARC_ALLOW(-1.00){example.net:s=openarc-20230616:i=1;},MIME_GOOD(-0.10){multipart/mixed;text/plain;},APPLE_MAILER_COMMON(0.00){},ASN(0.00){asn:12322, ipnet:E.F.0.0/11, country:FR;},FREEMAIL_CC(0.00){example.com;},FREEMAIL_ENVRCPT(0.00){example.fr;example.com;},FREEMAIL_TO(0.00){example.fr;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},MID_RHS_MATCH_FROM(0.00){},MIME_TRACE(0.00){0:+;1:+;2:~;},RCPT_COUNT_TWO(0.00){2;},RCVD_COUNT_ZERO(0.00){0;},TO_DN_ALL(0.00){},TO_MATCH_ENVRCPT_ALL(0.00){}", time_real="428.021ms", user="me" The field I need to split is symbols_scores_params. I’ve used this: sourcetype=rspamd user=* | makemv tokenizer="([^,]+),?" symbols_scores_params | mvexpand symbols_scores_params | rex field=symbols_scores_params "(?<name>[A-Z0-9_]+)\((?<score>-?[.0-9]+)\){(?<options>[^{}]+)}" | eval {name}_score=score, {name}_options=options It works great, proper fields are created (eg. BAYES_HAM_score, BAYES_HAM_options, etc.), but a single event is turned into a pack of 17 to 35 events. Is there a way to dedup those events and to keep every new fields extracted from symbols_scores_params ?