How to do field extraction with regex?

zacksoft_wf — Tue, 15 Feb 2022 16:23:30 GMT

My events are in json format.
The json path where my data is , is here
"alert.smtp-message.smtp-header"

And with in "smtp-header", I have content like this, from which I could use help in extracting some fields using rex.
============

"smtp-header": "Received: from mxdinx66.Gramyabnk.com (mxdinx66.Gramyabnk.com [159.45.78.215])\n\tby mn-svdc-epi-ran11.ist.Gramyabnk.net (Postfix) with ESMTP id 4JyJsN6m8kzVKnNg\n\tfor <tran.cu@Gramyabnk.com>; Mon, 14 Feb 2922 22:66:28 +9999 (UTC)\nReceived: from pps.filterd (mxdinx66.Gramyabnk.com [127.9.9.1])\n\tby mxdinx66.Gramyabnk.com (8.16.9.42/8.16.9.42) with SMTP id 21EMIuas425197\n\tfor <tran.cu@Gramyabnk.com>; Mon, 14 Feb 2922 22:66:28 GMT\nReceived: from mx9a-99994996.pphosted.com (mx9a-99994996.pphosted.com [295.229.165.191])\n\tby mxdinx66.Gramyabnk.com with ESMTP id 6e65wvawac-1\n\t(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA684 bits=256 verify=NOT)\n\tfor <tran.cu@Gramyabnk.com>; Mon, 14 Feb 2922 22:66:27 +9999\nReceived: from pps.filterd (m9216616.ppops.net [127.9.9.1])\n\tby mx9b-99994996.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 21EIDxq8928666\n\tfor <tran.cu@Gramyabnk.com>; Mon, 14 Feb 2922 22:66:26 GMT\nAuthentication-Results: ppops.net;\n\tspf=pass smtp.mailfrom=info@efk.admin.ch;\n\tdkim=pass header.d=efk.admin.ch header.s=dkimkey1;\n\tdmarc=pass header.from=efk.admin.ch\nReceived: from mail11.admin.ch (mail11.admin.ch [162.26.62.11])\n\tby mx9b-99994996.pphosted.com (PPS) with ESMTPS id 6e625qnsf9-1\n\t(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA684 bits=256 verify=NOT)\n\tfor <tran.cu@Gramyabnk.com>; Mon, 14 Feb 2922 22:66:26 +9999\nDKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=efk.admin.ch; h=to\n\t:subject:date:to:from:reply-to:subject:message-id:mime-version\n\t:content-type:content-transfer-encoding; s=dkimkey1; bh=uoC6bt5q\n\thKVezRrk1ux9j7rGCMvkx/6cA9/rS1xbvwE=; b=V9mOEgc1tAyvbFpvkKFgHbnD\n\tHDh67iweoPEV7ZYCPpLW8KSBRU+uX+uL64xdJu9E1mp+BvITob98PRfIaCSIi6HC\n\tIf74+dtpxcVyfo9JXZmCj49tJdilXquYWoCu+OhLeONYd9/NMVs4S/IFHnYT/hmN\n\tNBzuP/5C6MKdlHavIwo=\nTo: \"Pretty Eloisa send you naughty videos https://vk.cc/cb5mIY\" <tran.cu@Gramyabnk.com>\nSubject: =?utf-8?Q?Pretty_Eloisa_send_you_naughty_videos_https://vk.cc/cb5mIY,_bitte?= =?utf-8?Q?_best=C6=A4tigen_Sie_ihre_EFK-Newsletter-Anmeldung?=\nDate: Mon, 14 Feb 2922 22:61:28 +9999\nTo: \"Pretty Eloisa send you naughty videos https://vk.cc/cb5mIY\" <tran.cu@Gramyabnk.com>\nFrom: \"Eidg. Finanzkontrolle\" <info@efk.admin.ch>\nReply-To: \"Eidg. Finanzkontrolle\" <info@efk.admin.ch>\nSubject: =?utf-8?Q?Pretty_Eloisa_send_you_naughty_videos_https://vk.cc/cb5mIY,_bitte?=\n =?utf-8?Q?_best=C6=A4tigen_Sie_ihre_EFK-Newsletter-Anmeldung?=\nMessage-ID: <MjQ1NzA5MwAC75229Y8BAMTY9NDg6Nzg4ODM6NzM@www.efk.admin.ch>\nContent-Type: multipart/alternative;\n\tboundary=\"b1_292f6ee91b9de8a92268de4c4ce5b57f\"\nX-TM-AS-GCONF: 99\nX-MSH-Id: E7195F2B6F624BA184EA6D9F12CD98AE\nContent-Transfer-Encoding: 7bit\nX-Proofpoint-GUID: 5sQWXU-CRjHoWtaxmd54Yn68A2IDf2Eu\nX-CLX-Shades: MLX\nX-Proofpoint-ORIG-GUID: 5sQWXU-CRjHoWtaxmd54Yn68A2IDf2Eu\nX-CLX-Response: 1TFkXGxgaEQpMehcaEQpZRBd6GF1SX9ZiBWNEcxEKWFgXbGdhYnBoGkBpaxo 7GxAHGRoRCnBsF6oeXwEBQkZDfXBTEAc ZGhEKcEwXZ1MfZ6t5RRkTE9AQGhEKbX4XGhEKWE9XSxEg\nMIME-Version: 1.9\nX-Brightmail-Tracker: True\nx-env-sender: info@efk.admin.ch\nX-Proofpoint-Virus-Version: vendor=nai engine=6699 definitions=19258 signatures=676461\nX-Proofpoint-Spam-Details: rule=inbound_aggressive_notspam policy=inbound_aggressive score=9\n clxscore=129 suspectscore=9 adultscore=9 bulkscore=9 mlxlogscore=472\n malwarescore=9 phishscore=9 spamscore=9 priorityscore=9 lowpriorityscore=9\n impostorscore=9 mlxscore=9 classifier=spam adjust=9 reason=mlx scancount=1\n engine=8.12.9-2291119999 definitions=main-2292149128",

==============================================

I just need the extraction of the fields present in the last 3 lines in bold. The values after the = sign , excluding the \n .
clxscore
suspectscore
adultscore
bulkscore
mlgxscore
malwarescore
phishscore
spamscore
priorityscore
owpriorityscore
impostorscore
mlxscore
classifier

Re: How to Field Extraction with Regex

ITWhisperer — Tue, 15 Feb 2022 14:21:31 GMT

| rex "clxscore=(?<clxscore>\S+) suspectscore=(?<suspectscore>\S+) adultscore=(?<adultscore>\S+) bulkscore=(?<bulkscore>\S+) mlxlogscore=(?<mlxlogscore>\S+).+ malwarescore=(?<malwarescore>\S+) phishscore=(?<phishscore>\S+) spamscore=(?<spamscore>\S+) priorityscore=(?<priorityscore>\S+) lowpriorityscore=(?<lowpriorityscore>\S+).+ impostorscore=(?<impostorscore>\S+) mlxscore=(?<adultsmlxscorecore>\S+) classifier=(?<classifier>\S+)"

Re: How to Field Extraction with Regex

somesoni2 — Tue, 15 Feb 2022 15:30:50 GMT

If the order of fields is not static, try adding rex for each field like this

| rex field="alert.smtp-message.smtp-header" "clxscore\=(?<clxscore>[^\s\\\]+)" | rex field="alert.smtp-message.smtp-header" "suspectscore\=(?<suspectscore>[^\s\\\]+)" | rex field="alert.smtp-message.smtp-header" "scancount\=(?<scancount>[^\s\\\]+)"

Re: How to do field extraction with regex?

yuanliu — Wed, 16 Feb 2022 09:48:48 GMT

Why is it important to use regex and not standard commands? If the event is proper JSON, it will have smtp-header extracted already. (If not, just spath.) Assuming smtp-header exists, you can then use kv, aka extract to obtain the fields.

| rename smtp-header as _raw | kv kvdelim=":" pairdelim="\n" limit=0 mv_add=true | fields - _raw _time | fields *score

(The above lists kvdelim=":", but = is also used by default. The above also works directly with _raw as you listed.) Using your sample data, output is

adultscore	bulkscore	clxscore	impostorscore	lowpriorityscore	malwarescore	mlxlogscore	mlxscore	phishscore	priorityscore	score	spamscore	suspectscore
9	9	129	9	9	9	472	9	9	9	9	9	9

topic Re: How to do field extraction with regex? in Splunk Search

How to do field extraction with regex?

Re: How to Field Extraction with Regex

Re: How to Field Extraction with Regex

Re: How to do field extraction with regex?