Hi All!
Today I index a log created from a script that extract some interesting fields from each received (from journaling) mail.
This is a CSV log with the interesting fields and their contents separated by ";".
I have some issues to work - Ex. a ";" character on the subject.
I was looking for a less expensive and more elegant way to do this.
I found a less expensive way to export the interesting fields, but it's a multiline log.
Here is a piece of the log:
Mon, 18 May 2015 17:57:13 +0000
File: /email/1431971833.Vfd04I741e37M920326.lab
Sender: andresn@lab.com
Subject: Proceed to Gate
To: bhubnr@lab.com
To: ilsonn@lab.com
Cc: clasr@conf.com
Bcc: WLI_GRP@lab.com
Exchange-AuthAs: Internal
originalclientipaddress: 192.168.115.10
Size: 10246
Content-Disposition: False
Mon, 18 May 2015 17:57:02 +0000
File: /email/1431971822.Vfd04I74069fM442554.lab
Sender: paulaugust@lab.com
Subject: Follow up Marine.
To: awelter@lab.com
To: fpacker@info.com
Bcc: WADVENConfCall01@lab.com
Exchange-AuthAs: Internal
originalclientipaddress: 192.168.71.61
Size: 193022
Content-Disposition: True
Im trying to extract the fields using "rex" but still not running. After found the right regular expression I will add it on transforms.conf
The follow expression is OK using the "Extract New Fields" functionality but not on the search:
rex field=_raw "File: (?<FILE>.+) Subject: (?<SUBJECT>.+) Sender: (?<SENDER>.+) Recipient: (?<RCPT>.+) To: (?<TO>.+) Cc: (?<CC>.+) Exchange-AuthAs: (?<AUTH>.+) originalclientipaddress: (?<IP>.+) Size: (?<SIZE>.+) Content-Disposition: (?<ATTACH>.+) Bcc: (?<BCC>.+)"
Can anyone help me please?
I get my answer opening a support case.
You can add max_match to the rex. For example,
|rex field=_raw "File: (?.+)"| rex "Subject: (?.+)"| rex "Sender: (?.+)"| rex "Recipient: (?.+)"| rex "To: (?.+)" max_match=3| rex "Cc: (?.+)"| rex "Exchange-AuthAs: (?.+)"| rex "originalclientipaddress: (?.+)"| rex "Size: (?.+)"| rex "Content-Disposition: (?.+)"| rex "Bcc: (?.+)"
In order to capture all the fields that had multiple values (such as the To: and Cc: fields), I set the following up:
props.conf:
[test-240026]
REPORT-logextracts = extractions
Then in transforms.conf I had the following:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true
That worked good...except for the first line of the entries which was the date/time stamp.
It made a field named "Tue, 19 May 2015 13" with a value of "47:05 +0000" - which isn't really what we wanted.
So I added a sedcmd to modify the date line as it got indexed. I changed it from "Tue, 19 May 2015 13:40:43 +0000" to "Timestamp: Tue, 19 May 2015 13:40:43 +0000".
So, now it'll extract a field called "Timestamp" with a value of "Tue, 19 May 2015 13:40:43 +0000".
So now that we got that working, we can turn on MV_ADD=true in the transforms.conf. That'll take any other matches it finds and make a multi-value for it. So if it encounters multiple To: values in one event, it'll collect them all.
Here's the full props.conf:
[test-240026]
REPORT-logextracts = extractions
SEDCMD-fixdateline = s/(^[\w]{3},\s+[\d]{2}\s+[\w]{3}\s+[\d]{4}\s[\d]{2}:[\d]{2}:[\d]{2}\s+[\d]{4})/Timestamp: \1/g
Transforms.conf:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true
Thanks Brian for the support.
I get my answer opening a support case.
You can add max_match to the rex. For example,
|rex field=_raw "File: (?.+)"| rex "Subject: (?.+)"| rex "Sender: (?.+)"| rex "Recipient: (?.+)"| rex "To: (?.+)" max_match=3| rex "Cc: (?.+)"| rex "Exchange-AuthAs: (?.+)"| rex "originalclientipaddress: (?.+)"| rex "Size: (?.+)"| rex "Content-Disposition: (?.+)"| rex "Bcc: (?.+)"
In order to capture all the fields that had multiple values (such as the To: and Cc: fields), I set the following up:
props.conf:
[test-240026]
REPORT-logextracts = extractions
Then in transforms.conf I had the following:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true
That worked good...except for the first line of the entries which was the date/time stamp.
It made a field named "Tue, 19 May 2015 13" with a value of "47:05 +0000" - which isn't really what we wanted.
So I added a sedcmd to modify the date line as it got indexed. I changed it from "Tue, 19 May 2015 13:40:43 +0000" to "Timestamp: Tue, 19 May 2015 13:40:43 +0000".
So, now it'll extract a field called "Timestamp" with a value of "Tue, 19 May 2015 13:40:43 +0000".
So now that we got that working, we can turn on MV_ADD=true in the transforms.conf. That'll take any other matches it finds and make a multi-value for it. So if it encounters multiple To: values in one event, it'll collect them all.
Here's the full props.conf:
[test-240026]
REPORT-logextracts = extractions
SEDCMD-fixdateline = s/(^[\w]{3},\s+[\d]{2}\s+[\w]{3}\s+[\d]{4}\s[\d]{2}:[\d]{2}:[\d]{2}\s+[\d]{4})/Timestamp: \1/g
Transforms.conf:
[extractions]
REGEX=([^:]+):([^\r\n]+)
FORMAT=$1::$2
MV_ADD = true
Thanks Brian for the support.
| rex field=_raw "File: (?<FILE>.+)"| rex "Subject: (?<SUBJECT>.+)"| rex "Sender: (?<SENDER>.+)"| rex "Recipient: (?<TO>.+)"| rex "To: (?<TO>.+)"| rex "Cc: (?<CC>.+)"| rex "Exchange-AuthAs: (?<AUTH>.+)"| rex "originalclientipaddress: (?<IP>.+)"| rex "Size: (?<SIZE>.+)"| rex "Content-Disposition: (?<ATTACH>.+)"| rex "Bcc: (?<BCC>.+)"
This search string produce my desired results, but I want to do using props and transforms.
Extract New Fields leaves a little to be desired. I found this regex string to work on regex101.com. Your string had the fields out of order, plus there are multiple To: fields and the Cc: appears to be optional.
rex "(?gms)File: (?<FILE>.+?)Sender: (?<SENDER>.+?)Subject: (?<SUBJECT>.+?)To: (?<TO>.+?)To: (?<TO2>.+?)(?:Cc: (?<CC>.+?)){0,1}Bcc: (?<BCC>.+?)Exchange-AuthAs: (?<AUTH>.+?)originalclientipaddress: (?<IP>.+?)Size: (?<SIZE>.+?)Content-Disposition: (?<ATTACH>.+?)$"
Yes, the fields Cc To and Bcc are optional. Just one of them are necessary and I can found more than one of them on a log entry.
Splunk do this extraction when indexing Windows Eventlog, but I can't found the way (yet).
Perhaps rex is not the best way to do this. The data is already in keyword:value format so it might be best to let Splunk take it as it will.
Sure, but Splunk is not doing the work.
What are your props.conf and transforms.conf settings?
props.conf
[MailHeader]
NO_BINARY_CHECK = 1
pulldown_type = 1
CHECK_FOR_HEADER = false
REPORT-AutoHeader = MailHeader
Nothing on Transforms.conf
You'll want to add KV_MODE=none
to props.conf and create a MailHeader stanza in transforms.conf.
[MailHeader]
MV_ADD = true
REGEX=(?gms)File: (.+?)Sender: (.+?)Subject: (.+?)To: (.+?)To: (.+?)(?:Cc: (.+?)){0,1}Bcc: (.+?)Exchange-AuthAs: (.+?)originalclientipaddress: (.+?)Size: (.+?)Content-Disposition: (.+?)$
FORMAT = FILE::$1 SENDER::$2 SUBJECT:$3 TO::$4 TO2$5 CC::$6 BCC::$7 AUTH::$8 IP::$9 SIZE::$10 ATTACH::$11
No way... 😞
I try using "SHOULD_LINEMERGE = true" and "BREAK_ONLY_BEFORE = ^$" to, but not working.
I'm afraid I'm out of suggestions.
thanks by the effort