Splunk Search

How to build a regex for a field extraction that will match a portion of the domain for the field?

New Member

I'm looking at sendmail logs and I'm trying to pull out a portion of the domain name based on the relay.

I've testing using rex and have arrived at the following command.

index=mail stat relay | rex "relay=([a-zA-Z0-9-]+\.)*(?<test123>([a-zA-Z0-9-]+\.){1}((ab|bc|mb|nb|nf|nl|ns|nt|nu|on|pe|qc|sk|yk).)?([a-zA-Z0-9-]+))(s)?" | table test123

With a log line that looks like this
Nov 12 22:24:37 some.mail.host Nov 12 22:24:37 sendmail[9056]: sAD5OZKS011800: to=********@gov.ab.ca, delay=00:00:02, xdelay=00:00:01, mailer=smtp, pri=66484, relay=something.gov.ab.ca. [XXX.XXX.XXX.XXX], dsn=2.0.0, stat=Sent (ok: Message 54730621 accepted)

Nov 13 09:34:13 some.mail.host Nov 13 09:34:13 sendmail[30002]: sADGYCM5028904: to=something@example.com, ctladdr=somethingelse@example.com (999/25), delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=37906, relay=aspmx.l.google.com. [XXX.XXX.XXX.XXX], dsn=2.0.0, stat=Sent (OK 1415896453 63si40410316iol.79 - gsmtp)

Two sample relays are
something.gov.ab.ca
something.blah.google.com

In testing I end up with ab.ca and google.com.
What I'm trying to get is gov.ab.ca and google.com.

I've played with a number of regex tools online and they seem to aggressively match the gov.ab.ca. In splunk it seems that ? after ((ab|bc|mb|nb|nf|nl|ns|nt|nu|on|pe|qc|sk|yk).) acts more like +? based on the regular expression documentation I've come across online.

Is there something I can do to get the behavior I'm looking for?

0 Karma

New Member

I've added another example. I can see where you are going with your regex and it's not quite working the way I want. You are simply pulling off the first piece of the domain which can vary. For example. If the following relays showed up in the log
1.2.3.4.google.com
1.2.3.google.com
1.2.google.com
1.google.com
something.gov.ab.ca

I want google.com for the first 4 and because ab exists right before .ca in the final one I want gov.ab.ca rather then just ab.ca.

Is that a bit clearer?

0 Karma

New Member

I can't seem to comment on the provided answer. I've fixed the query in my question and clarified where I believe the problem is.

0 Karma

Motivator

Not sure if i understood correctly, let's say relay can hold
relay = something.gov.ab.ca as well as
relay = somewhere.google.com ? if so, you can use

base search|rex field=_raw "relay=\w+\.(?<Domain>.*)\s"

Pardon me if i am going tangents, can you post sample for relay=xxx.google.com as well? I assumed the pattern as "something.google.com"
Hope this helps,
Thanks,
Raghav

0 Karma

Motivator

The address you are trying to extract is part of a key value pair
Try this:

.....|rex field=_raw "relay=\S+(?.*)\s"

0 Karma

Motivator

Please add a back slash before S+ and s. It disappeared from my post.

0 Karma

SplunkTrust
SplunkTrust

if you mark some test and klick the 101010 symbol it will show up in the post .... magic all over the place 🙂

0 Karma