Splunk Search

Mail Journal Header Log

pierre_weg
Path Finder

Hi All!

Today I index a log created from a script that extract some interesting fields from each received (from journaling) mail.
This is a CSV log with the interesting fields and their contents separated by ";".
I have some issues to work - Ex. a ";" character on the subject.
I was looking for a less expensive and more elegant way to do this.
I found a less expensive way to export the interesting fields, but it's a multiline log.
Here is a piece of the log:

Mon, 18 May 2015 17:57:13 +0000
File: /email/1431971833.Vfd04I741e37M920326.lab
Sender: andresn@lab.com
Subject: Proceed to Gate 
To: bhubnr@lab.com
To: ilsonn@lab.com
Cc: clasr@conf.com
Bcc: WLI_GRP@lab.com
Exchange-AuthAs: Internal
originalclientipaddress: 192.168.115.10
Size: 10246
Content-Disposition: False

Mon, 18 May 2015 17:57:02 +0000
File: /email/1431971822.Vfd04I74069fM442554.lab
Sender: paulaugust@lab.com
Subject: Follow up Marine.
To: awelter@lab.com
To: fpacker@info.com
Bcc: WADVENConfCall01@lab.com
Exchange-AuthAs: Internal
originalclientipaddress: 192.168.71.61
Size: 193022
Content-Disposition: True

Im trying to extract the fields using "rex" but still not running. After found the right regular expression I will add it on transforms.conf
The follow expression is OK using the "Extract New Fields" functionality but not on the search:

rex field=_raw "File: (?<FILE>.+) Subject: (?<SUBJECT>.+) Sender: (?<SENDER>.+) Recipient: (?<RCPT>.+) To: (?<TO>.+) Cc: (?<CC>.+) Exchange-AuthAs: (?<AUTH>.+) originalclientipaddress: (?<IP>.+) Size: (?<SIZE>.+) Content-Disposition: (?<ATTACH>.+) Bcc: (?<BCC>.+)"

Can anyone help me please?

0 Karma
1 Solution

pierre_weg
Path Finder

I get my answer opening a support case.

You can add max_match to the rex. For example,

|rex field=_raw "File: (?.+)"| rex "Subject: (?.+)"| rex "Sender: (?.+)"| rex "Recipient: (?.+)"| rex "To: (?.+)" max_match=3| rex "Cc: (?.+)"| rex "Exchange-AuthAs: (?.+)"| rex "originalclientipaddress: (?.+)"| rex "Size: (?.+)"| rex "Content-Disposition: (?.+)"| rex "Bcc: (?.+)"

In order to capture all the fields that had multiple values (such as the To: and Cc: fields), I set the following up:

props.conf:
[test-240026]
REPORT-logextracts = extractions

Then in transforms.conf I had the following:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true

That worked good...except for the first line of the entries which was the date/time stamp.

It made a field named "Tue, 19 May 2015 13" with a value of "47:05 +0000" - which isn't really what we wanted.

So I added a sedcmd to modify the date line as it got indexed. I changed it from "Tue, 19 May 2015 13:40:43 +0000" to "Timestamp: Tue, 19 May 2015 13:40:43 +0000".

So, now it'll extract a field called "Timestamp" with a value of "Tue, 19 May 2015 13:40:43 +0000".

So now that we got that working, we can turn on MV_ADD=true in the transforms.conf. That'll take any other matches it finds and make a multi-value for it. So if it encounters multiple To: values in one event, it'll collect them all.

Here's the full props.conf:
[test-240026]
REPORT-logextracts = extractions
SEDCMD-fixdateline = s/(^[\w]{3},\s+[\d]{2}\s+[\w]{3}\s+[\d]{4}\s[\d]{2}:[\d]{2}:[\d]{2}\s+[\d]{4})/Timestamp: \1/g

Transforms.conf:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true

Thanks Brian for the support.

View solution in original post

0 Karma

pierre_weg
Path Finder

I get my answer opening a support case.

You can add max_match to the rex. For example,

|rex field=_raw "File: (?.+)"| rex "Subject: (?.+)"| rex "Sender: (?.+)"| rex "Recipient: (?.+)"| rex "To: (?.+)" max_match=3| rex "Cc: (?.+)"| rex "Exchange-AuthAs: (?.+)"| rex "originalclientipaddress: (?.+)"| rex "Size: (?.+)"| rex "Content-Disposition: (?.+)"| rex "Bcc: (?.+)"

In order to capture all the fields that had multiple values (such as the To: and Cc: fields), I set the following up:

props.conf:
[test-240026]
REPORT-logextracts = extractions

Then in transforms.conf I had the following:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true

That worked good...except for the first line of the entries which was the date/time stamp.

It made a field named "Tue, 19 May 2015 13" with a value of "47:05 +0000" - which isn't really what we wanted.

So I added a sedcmd to modify the date line as it got indexed. I changed it from "Tue, 19 May 2015 13:40:43 +0000" to "Timestamp: Tue, 19 May 2015 13:40:43 +0000".

So, now it'll extract a field called "Timestamp" with a value of "Tue, 19 May 2015 13:40:43 +0000".

So now that we got that working, we can turn on MV_ADD=true in the transforms.conf. That'll take any other matches it finds and make a multi-value for it. So if it encounters multiple To: values in one event, it'll collect them all.

Here's the full props.conf:
[test-240026]
REPORT-logextracts = extractions
SEDCMD-fixdateline = s/(^[\w]{3},\s+[\d]{2}\s+[\w]{3}\s+[\d]{4}\s[\d]{2}:[\d]{2}:[\d]{2}\s+[\d]{4})/Timestamp: \1/g

Transforms.conf:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true

Thanks Brian for the support.

0 Karma

pierre_weg
Path Finder
| rex field=_raw "File: (?&lt;FILE&gt;.+)"| rex "Subject: (?&lt;SUBJECT&gt;.+)"| rex "Sender: (?&lt;SENDER&gt;.+)"| rex "Recipient: (?&lt;TO&gt;.+)"| rex "To: (?&lt;TO&gt;.+)"| rex "Cc: (?&lt;CC&gt;.+)"| rex "Exchange-AuthAs: (?&lt;AUTH&gt;.+)"| rex "originalclientipaddress: (?&lt;IP&gt;.+)"| rex "Size: (?&lt;SIZE&gt;.+)"| rex "Content-Disposition: (?&lt;ATTACH&gt;.+)"| rex "Bcc: (?&lt;BCC&gt;.+)"

This search string produce my desired results, but I want to do using props and transforms.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Extract New Fields leaves a little to be desired. I found this regex string to work on regex101.com. Your string had the fields out of order, plus there are multiple To: fields and the Cc: appears to be optional.

rex "(?gms)File: (?<FILE>.+?)Sender: (?<SENDER>.+?)Subject: (?<SUBJECT>.+?)To: (?<TO>.+?)To: (?<TO2>.+?)(?:Cc: (?<CC>.+?)){0,1}Bcc: (?<BCC>.+?)Exchange-AuthAs: (?<AUTH>.+?)originalclientipaddress: (?<IP>.+?)Size: (?<SIZE>.+?)Content-Disposition: (?<ATTACH>.+?)$"
---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

Yes, the fields Cc To and Bcc are optional. Just one of them are necessary and I can found more than one of them on a log entry.
Splunk do this extraction when indexing Windows Eventlog, but I can't found the way (yet).

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Perhaps rex is not the best way to do this. The data is already in keyword:value format so it might be best to let Splunk take it as it will.

---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

Sure, but Splunk is not doing the work.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

What are your props.conf and transforms.conf settings?

---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

props.conf
[MailHeader]
NO_BINARY_CHECK = 1
pulldown_type = 1
CHECK_FOR_HEADER = false
REPORT-AutoHeader = MailHeader

Nothing on Transforms.conf

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You'll want to add KV_MODE=none to props.conf and create a MailHeader stanza in transforms.conf.

[MailHeader]
MV_ADD = true
REGEX=(?gms)File: (.+?)Sender: (.+?)Subject: (.+?)To: (.+?)To: (.+?)(?:Cc: (.+?)){0,1}Bcc: (.+?)Exchange-AuthAs: (.+?)originalclientipaddress: (.+?)Size: (.+?)Content-Disposition: (.+?)$
FORMAT = FILE::$1 SENDER::$2 SUBJECT:$3 TO::$4 TO2$5 CC::$6 BCC::$7 AUTH::$8 IP::$9 SIZE::$10 ATTACH::$11
---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

No way... 😞
I try using "SHOULD_LINEMERGE = true" and "BREAK_ONLY_BEFORE = ^$" to, but not working.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I'm afraid I'm out of suggestions.

---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

thanks by the effort

0 Karma
Get Updates on the Splunk Community!

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...