Splunk Search

Mail Journal Header Log

pierre_weg
Path Finder

Hi All!

Today I index a log created from a script that extract some interesting fields from each received (from journaling) mail.
This is a CSV log with the interesting fields and their contents separated by ";".
I have some issues to work - Ex. a ";" character on the subject.
I was looking for a less expensive and more elegant way to do this.
I found a less expensive way to export the interesting fields, but it's a multiline log.
Here is a piece of the log:

Mon, 18 May 2015 17:57:13 +0000
File: /email/1431971833.Vfd04I741e37M920326.lab
Sender: andresn@lab.com
Subject: Proceed to Gate 
To: bhubnr@lab.com
To: ilsonn@lab.com
Cc: clasr@conf.com
Bcc: WLI_GRP@lab.com
Exchange-AuthAs: Internal
originalclientipaddress: 192.168.115.10
Size: 10246
Content-Disposition: False

Mon, 18 May 2015 17:57:02 +0000
File: /email/1431971822.Vfd04I74069fM442554.lab
Sender: paulaugust@lab.com
Subject: Follow up Marine.
To: awelter@lab.com
To: fpacker@info.com
Bcc: WADVENConfCall01@lab.com
Exchange-AuthAs: Internal
originalclientipaddress: 192.168.71.61
Size: 193022
Content-Disposition: True

Im trying to extract the fields using "rex" but still not running. After found the right regular expression I will add it on transforms.conf
The follow expression is OK using the "Extract New Fields" functionality but not on the search:

rex field=_raw "File: (?<FILE>.+) Subject: (?<SUBJECT>.+) Sender: (?<SENDER>.+) Recipient: (?<RCPT>.+) To: (?<TO>.+) Cc: (?<CC>.+) Exchange-AuthAs: (?<AUTH>.+) originalclientipaddress: (?<IP>.+) Size: (?<SIZE>.+) Content-Disposition: (?<ATTACH>.+) Bcc: (?<BCC>.+)"

Can anyone help me please?

0 Karma
1 Solution

pierre_weg
Path Finder

I get my answer opening a support case.

You can add max_match to the rex. For example,

|rex field=_raw "File: (?.+)"| rex "Subject: (?.+)"| rex "Sender: (?.+)"| rex "Recipient: (?.+)"| rex "To: (?.+)" max_match=3| rex "Cc: (?.+)"| rex "Exchange-AuthAs: (?.+)"| rex "originalclientipaddress: (?.+)"| rex "Size: (?.+)"| rex "Content-Disposition: (?.+)"| rex "Bcc: (?.+)"

In order to capture all the fields that had multiple values (such as the To: and Cc: fields), I set the following up:

props.conf:
[test-240026]
REPORT-logextracts = extractions

Then in transforms.conf I had the following:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true

That worked good...except for the first line of the entries which was the date/time stamp.

It made a field named "Tue, 19 May 2015 13" with a value of "47:05 +0000" - which isn't really what we wanted.

So I added a sedcmd to modify the date line as it got indexed. I changed it from "Tue, 19 May 2015 13:40:43 +0000" to "Timestamp: Tue, 19 May 2015 13:40:43 +0000".

So, now it'll extract a field called "Timestamp" with a value of "Tue, 19 May 2015 13:40:43 +0000".

So now that we got that working, we can turn on MV_ADD=true in the transforms.conf. That'll take any other matches it finds and make a multi-value for it. So if it encounters multiple To: values in one event, it'll collect them all.

Here's the full props.conf:
[test-240026]
REPORT-logextracts = extractions
SEDCMD-fixdateline = s/(^[\w]{3},\s+[\d]{2}\s+[\w]{3}\s+[\d]{4}\s[\d]{2}:[\d]{2}:[\d]{2}\s+[\d]{4})/Timestamp: \1/g

Transforms.conf:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true

Thanks Brian for the support.

View solution in original post

0 Karma

pierre_weg
Path Finder

I get my answer opening a support case.

You can add max_match to the rex. For example,

|rex field=_raw "File: (?.+)"| rex "Subject: (?.+)"| rex "Sender: (?.+)"| rex "Recipient: (?.+)"| rex "To: (?.+)" max_match=3| rex "Cc: (?.+)"| rex "Exchange-AuthAs: (?.+)"| rex "originalclientipaddress: (?.+)"| rex "Size: (?.+)"| rex "Content-Disposition: (?.+)"| rex "Bcc: (?.+)"

In order to capture all the fields that had multiple values (such as the To: and Cc: fields), I set the following up:

props.conf:
[test-240026]
REPORT-logextracts = extractions

Then in transforms.conf I had the following:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true

That worked good...except for the first line of the entries which was the date/time stamp.

It made a field named "Tue, 19 May 2015 13" with a value of "47:05 +0000" - which isn't really what we wanted.

So I added a sedcmd to modify the date line as it got indexed. I changed it from "Tue, 19 May 2015 13:40:43 +0000" to "Timestamp: Tue, 19 May 2015 13:40:43 +0000".

So, now it'll extract a field called "Timestamp" with a value of "Tue, 19 May 2015 13:40:43 +0000".

So now that we got that working, we can turn on MV_ADD=true in the transforms.conf. That'll take any other matches it finds and make a multi-value for it. So if it encounters multiple To: values in one event, it'll collect them all.

Here's the full props.conf:
[test-240026]
REPORT-logextracts = extractions
SEDCMD-fixdateline = s/(^[\w]{3},\s+[\d]{2}\s+[\w]{3}\s+[\d]{4}\s[\d]{2}:[\d]{2}:[\d]{2}\s+[\d]{4})/Timestamp: \1/g

Transforms.conf:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true

Thanks Brian for the support.

0 Karma

pierre_weg
Path Finder
| rex field=_raw "File: (?&lt;FILE&gt;.+)"| rex "Subject: (?&lt;SUBJECT&gt;.+)"| rex "Sender: (?&lt;SENDER&gt;.+)"| rex "Recipient: (?&lt;TO&gt;.+)"| rex "To: (?&lt;TO&gt;.+)"| rex "Cc: (?&lt;CC&gt;.+)"| rex "Exchange-AuthAs: (?&lt;AUTH&gt;.+)"| rex "originalclientipaddress: (?&lt;IP&gt;.+)"| rex "Size: (?&lt;SIZE&gt;.+)"| rex "Content-Disposition: (?&lt;ATTACH&gt;.+)"| rex "Bcc: (?&lt;BCC&gt;.+)"

This search string produce my desired results, but I want to do using props and transforms.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Extract New Fields leaves a little to be desired. I found this regex string to work on regex101.com. Your string had the fields out of order, plus there are multiple To: fields and the Cc: appears to be optional.

rex "(?gms)File: (?<FILE>.+?)Sender: (?<SENDER>.+?)Subject: (?<SUBJECT>.+?)To: (?<TO>.+?)To: (?<TO2>.+?)(?:Cc: (?<CC>.+?)){0,1}Bcc: (?<BCC>.+?)Exchange-AuthAs: (?<AUTH>.+?)originalclientipaddress: (?<IP>.+?)Size: (?<SIZE>.+?)Content-Disposition: (?<ATTACH>.+?)$"
---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

Yes, the fields Cc To and Bcc are optional. Just one of them are necessary and I can found more than one of them on a log entry.
Splunk do this extraction when indexing Windows Eventlog, but I can't found the way (yet).

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Perhaps rex is not the best way to do this. The data is already in keyword:value format so it might be best to let Splunk take it as it will.

---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

Sure, but Splunk is not doing the work.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

What are your props.conf and transforms.conf settings?

---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

props.conf
[MailHeader]
NO_BINARY_CHECK = 1
pulldown_type = 1
CHECK_FOR_HEADER = false
REPORT-AutoHeader = MailHeader

Nothing on Transforms.conf

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You'll want to add KV_MODE=none to props.conf and create a MailHeader stanza in transforms.conf.

[MailHeader]
MV_ADD = true
REGEX=(?gms)File: (.+?)Sender: (.+?)Subject: (.+?)To: (.+?)To: (.+?)(?:Cc: (.+?)){0,1}Bcc: (.+?)Exchange-AuthAs: (.+?)originalclientipaddress: (.+?)Size: (.+?)Content-Disposition: (.+?)$
FORMAT = FILE::$1 SENDER::$2 SUBJECT:$3 TO::$4 TO2$5 CC::$6 BCC::$7 AUTH::$8 IP::$9 SIZE::$10 ATTACH::$11
---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

No way... 😞
I try using "SHOULD_LINEMERGE = true" and "BREAK_ONLY_BEFORE = ^$" to, but not working.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I'm afraid I'm out of suggestions.

---
If this reply helps you, Karma would be appreciated.
0 Karma

pierre_weg
Path Finder

thanks by the effort

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...