Splunk Search

Regexes for Exchange SMTP logs

Thuan
Explorer

Greetings,

The sample logs are listed below
2014-06-18T02:25:16.879Z,TSEAET01\NEW - Internet receive connector TSEAET01,08D1456B7AFF9BDF,22,147.81.121.139:25,147.81.122.24:61707,,"CN=Entrust Certification Authority - L1C, OU=""(c) 2009 Entrust, Inc."", OU=www.entrust.net/rpa is incorporated by reference, O=""Entrust, Inc."", C=US",Certificate issuer name
2014-06-18T02:25:16.879Z,TSEAET01\NEW - Internet receive connector TSEAET01,08D1456B7AFF9BDF,23,147.81.121.139:25,147.81.122.24:61707,
,4C1B9021,Certificate serial number
2014-06-18T02:25:16.879Z,TSEAET01\NEW - Internet receive connector TSEAET01,08D1456B7AFF9BDF,24,147.81.121.139:25,147.81.122.24:61707,,27A7B6AAACBE39610C3A148D60EF4F5F2BE60FB0,Certificate thumbprint
2014-06-18T02:25:16.879Z,TSEAET01\NEW - Internet receive connector TSEAET01,08D1456B7AFF9BDF,25,147.81.121.139:25,147.81.122.24:61707,
,TSEAET01.tascnet.tasc.com;Mail1.tasc.com;Mail.tasc.com;Mail.tascnet.tasc.com,Certificate alternate names
2014-06-18T02:25:16.910Z,TSEAET01\NEW - Internet receive connector TSEAET01,08D1456B7AFF9BDF,26,147.81.121.139:25,147.81.122.24:61707,*,,Received certificate

The field headers are

Fields: date-time,connector-id,session-id,sequence-number,local-endpoint,remote-endpoint,event,data,context

My (non-working) regexes are as follows

(?[^,]+),(?[^,]+),(?[^,]+),(?[^,]+),(?[^,]+),(?[^,]+),(?[^,]+),(?(?=")(?.+),)|(?(?=,)(?,),)|(?(?=.)(?[^,]),),(?.+)\r\n

I have trouble parsing the field named "data", which can take any one the following forms
1) "CN=Entrust Certification Authority - L1C, OU=""(c) 2009 Entrust, Inc."", OU=www.entrust.net/rpa is incorporated by reference, O=""Entrust, Inc."", C=US"
2) 4C1B9021
3) ,

It appears that the time stamp is processed correctly, however.

Tags (2)
0 Karma

Thuan
Explorer

Sorry, I copied the wrong regex with the previous answer.
The correct and working regex in props.conf is listed below

[xchange_smtp]
NO_BINARY_CHECK = 1
pulldown_type = 1
BREAK_ONLY_BEFORE_DATE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%N%Z
EXTRACT-xchange_smtp =,(?[^,]+),(?[^,]+),(?[^,]+),(?[^,]+),(?[^,]+),(?[^,]),(?".+"|[^,]|,),(?.*)

Thuan
Explorer

The working regex in props.conf is listed below

[xchange_agent]
NO_BINARY_CHECK = 1
pulldown_type = 1
BREAK_ONLY_BEFORE_DATE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%N%Z
EXTRACT-xchange_agent =,(?P[^,]),(?P[^,]+),(?P[^,]+),(?P[^,]+),(?P,|[^,]),(?P[^,]),(?P<21FromAddresses>[^,]),(?P[^,]),(?P[^,]),(?P[^,]),(?P[^,]),(?P[^,]),(?P[^,]),(?P[^,]),(?P[^,]),(?P.*)

0 Karma

somesoni2
Revered Legend

What values of data does the regex returning? I does seems to work with your sample data for me.

0 Karma

Thuan
Explorer

Thank you for promptness.

A _
The suggested regex does not work. It cannot parse the "data" field as listed below
1) "CN=Entrust Certification Authority - L1C, OU=""(c) 2009 Entrust, Inc."", OU=www.entrust.net/rpa is incorporated by reference, O=""Entrust, Inc."", C=US"
2) 4C1B9021
3) ,

B _

The blogs is about the IIS format, not MS Exchange SMTP logs format. In fact, I have tried to parse a sample file using the "IIS" existing source type. That attempts fails too.

Looking forward to a working solution.

0 Karma

Thuan
Explorer

Thank you for promptness.

A _
The suggested regex does not work. It cannot parse the "data" field as listed below
1) "CN=Entrust Certification Authority - L1C, OU=""(c) 2009 Entrust, Inc."", OU=www.entrust.net/rpa is incorporated by reference, O=""Entrust, Inc."", C=US"
2) 4C1B9021
3) ,

B _

The blogs is about the IIS format, not MS Exchange SMTP logs format. In fact, I have tried to parse a sample file using the "IIS" existing source type. That attempts fails too.

Looking forward to a working solution.

0 Karma

somesoni2
Revered Legend

Try this

^(?<date_time>[^,]+),(?<connector_id>[^,]+),(?<session_id>[^,]+),(?<sequence_number>[^,]+),(?<local_endpoint>[^,]+),(?<remote_endpoint>[^,]+),(?<event>[^,]*),(?P<data>.*),(?P<context>.*)$
0 Karma

ahall_splunk
Splunk Employee
Splunk Employee

Is there some reason you are not using INDEXED_EXTRACTIONS?

Here is a blog post about the feature:
http://blogs.splunk.com/2013/10/18/iis-logs-and-splunk-6/

It deals with IIS logs, but the same principal can be used.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...