Getting Data In

spliting a field without multipling events

patpro
Path Finder

Hello,

I would like to properly parse rspamd logs that look like this (2 lines sample):

 

2023-11-12 16:06:22 #28191(rspamd_proxy) <8eca26>; proxy; rspamd_task_write_log: action="no action", digest="107a69c58d90a38bb0214546cbe78b52", dns_req="79", filename="undef", forced_action="undef", ip="A.B.C.D", is_spam="F", len="98123",  subject="foobar", head_from=""dude" <info@example.com>", head_to=""other" <other@example.net>", head_date="Sun, 12 Nov 2023 07:08:57 +0000", head_ua="nil", mid="<B3.B6.48980.82A70556@gg.mta2vrest.cc.prd.sparkpost>", qid="0E4231619A", scores="-0.71/15.00", settings_id="undef", symbols_scores_params="BAYES_HAM(-3.00){100.00%;},RBL_MAILSPIKE_VERYBAD(1.50){A.B.C.D:from;},RWL_AMI_LASTHOP(-1.00){A.B.C.D:from;},URI_COUNT_ODD(1.00){105;},FORGED_SENDER(0.30){info@example.com;bounce-1699772934217.160136898825325270136859@example.com;},MANY_INVISIBLE_PARTS(0.30){4;},ZERO_FONT(0.20){2;},BAD_REP_POLICIES(0.10){},MIME_GOOD(-0.10){multipart/alternative;text/plain;},HAS_LIST_UNSUB(-0.01){},ARC_NA(0.00){},ASN(0.00){asn:23528, ipnet:A.B.C.0/20, country:US;},DKIM_TRACE(0.00){example.com:+;},DMARC_POLICY_ALLOW(0.00){example.com;none;},FROM_HAS_DN(0.00){},FROM_NEQ_ENVFROM(0.00){info@example.com;bounce-1699772934217.160136898825325270136859@example.com;},HAS_REPLYTO(0.00){support@example.com;},MIME_TRACE(0.00){0:+;1:+;2:~;},RCPT_COUNT_ONE(0.00){1;},RCVD_COUNT_ZERO(0.00){0;},REDIRECTOR_URL(0.00){twitter.com;},REPLYTO_DOM_NEQ_FROM_DOM(0.00){},R_DKIM_ALLOW(0.00){example.com:s=scph0618;},R_SPF_ALLOW(0.00){+exists:A.B.C.D._spf.sparkpostmail.com;},TO_DN_ALL(0.00){},TO_MATCH_ENVRCPT_ALL(0.00){}", time_real="1605.197ms", user="undef"
2023-11-12 16:02:04 #28191(rspamd_proxy) <4a3599>; proxy; rspamd_task_write_log: action="no action", digest="5151f8aa4eaebc5877c7308fed4ea21e", dns_req="19", filename="undef", forced_action="undef", ip="E.F.G.H", is_spam="F", len="109529",  subject="Re: barfoo", head_from="other me <other@example.net>", head_to="someone <someone@exmaple.fr>", head_date="Sun, 12 Nov 2023 16:02:03 +0100", head_ua="Apple Mail (2.3731.700.6)", mid="<3425840B-B955-4647-AB4D-163FC54BE820@example.net>", qid="163A215DB3", scores="-4.09/15.00", settings_id="undef", symbols_scores_params="BAYES_HAM(-2.99){99.99%;},ARC_ALLOW(-1.00){example.net:s=openarc-20230616:i=1;},MIME_GOOD(-0.10){multipart/mixed;text/plain;},APPLE_MAILER_COMMON(0.00){},ASN(0.00){asn:12322, ipnet:E.F.0.0/11, country:FR;},FREEMAIL_CC(0.00){example.com;},FREEMAIL_ENVRCPT(0.00){example.fr;example.com;},FREEMAIL_TO(0.00){example.fr;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},MID_RHS_MATCH_FROM(0.00){},MIME_TRACE(0.00){0:+;1:+;2:~;},RCPT_COUNT_TWO(0.00){2;},RCVD_COUNT_ZERO(0.00){0;},TO_DN_ALL(0.00){},TO_MATCH_ENVRCPT_ALL(0.00){}", time_real="428.021ms", user="me"

 

 

The field I need to split is symbols_scores_params. I’ve used this:

 

sourcetype=rspamd user=* | makemv tokenizer="([^,]+),?" symbols_scores_params | mvexpand symbols_scores_params | rex field=symbols_scores_params "(?<name>[A-Z0-9_]+)\((?<score>-?[.0-9]+)\){(?<options>[^{}]+)}" | eval {name}_score=score, {name}_options=options

 

 

It works great, proper fields are created (eg. BAYES_HAM_score, BAYES_HAM_options, etc.), but a single event is turned into a pack of 17 to 35 events.

Is there a way to dedup those events and to keep every new fields extracted from symbols_scores_params ?

Labels (1)
0 Karma
1 Solution

tscroggins
Champion

Hi @patpro;

Let's cheat and convert your split symbols_scores_params field into JSON using rex in sed mode and then use spath to parse the JSON.

Sed allows backreferences, so we can use the first capture group, the score name, as a prefix to the _score and _options strings.

I've also modified your makemv regular expression to accommodate options containing commas; right-brace followed by an optional comma appears to be an appropriate boundary.

| makemv tokenizer="([^}]+}),?" symbols_scores_params
| rex mode=sed field=symbols_scores_params "s/([^(]+)\\(([^)]+)\\){([^}]*)}/{\"\\1_score\":\"\\2\",\"\\1_options\":\"\\3\"}/"
| eval symbols_scores_params="[".mvjoin(symbols_scores_params, ",")."]"
| spath input=symbols_scores_params
| rename "{}.*" as *

 The eval-spath-rename sequence works around spath only operating on the first value of a multivalued field.

View solution in original post

tscroggins
Champion

Hi @patpro;

Let's cheat and convert your split symbols_scores_params field into JSON using rex in sed mode and then use spath to parse the JSON.

Sed allows backreferences, so we can use the first capture group, the score name, as a prefix to the _score and _options strings.

I've also modified your makemv regular expression to accommodate options containing commas; right-brace followed by an optional comma appears to be an appropriate boundary.

| makemv tokenizer="([^}]+}),?" symbols_scores_params
| rex mode=sed field=symbols_scores_params "s/([^(]+)\\(([^)]+)\\){([^}]*)}/{\"\\1_score\":\"\\2\",\"\\1_options\":\"\\3\"}/"
| eval symbols_scores_params="[".mvjoin(symbols_scores_params, ",")."]"
| spath input=symbols_scores_params
| rename "{}.*" as *

 The eval-spath-rename sequence works around spath only operating on the first value of a multivalued field.

patpro
Path Finder

Works really great!

thanks a lot.

Tags (1)
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...