Splunk Search

How to write the regex to extract the domains from URLs?

ccsfdave
Builder

I have been through the field extractor, answers.splunk.com, and the interwebs looking for help on this one. So our Palo Alto will give us the URLs of sites visited - here is a sample:

crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl
safebrowsing-cache.google.com/
p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/
de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html
a248.e.akamai.net/

I would like to be able to extract the domains e.g.

microsoft or microsoft.com
google or google.com
gstatic or gstatic.com
tynt or tynt.com
akamai or akamai.net

I would think that the way to go about it is to look for the FIRST .com, .net, .org etc and then work back to the previous . to grab the domain but that is beyond me.

Can anyone help?

1 Solution

somesoni2
Revered Legend

Try this run anywhere sample

| gentimes start=-1 | eval URL="crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl safebrowsing-cache.google.com/ p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/ de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html a248.e.akamai.net/" | table _raw  | makemv URL| mvexpand URL| rex field=URL "(?<domain>\w+\.\w+)\/"

View solution in original post

somesoni2
Revered Legend

Try this run anywhere sample

| gentimes start=-1 | eval URL="crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl safebrowsing-cache.google.com/ p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/ de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html a248.e.akamai.net/" | table _raw  | makemv URL| mvexpand URL| rex field=URL "(?<domain>\w+\.\w+)\/"

ccsfdave
Builder

@somesoni2

You have it, but help me understand it so that I may apply it to my search. As @Rhin0Crash stated the Palo Altos see the field as "url" so my base search is: index=pan_logs sourcetype=pan* src_ip=x.x.x.x url=*

0 Karma

Rhin0Crash
Path Finder

@ccsfdave :

index=pan_logs sourcetype=pan* src_ip=x.x.x.x url=* | rex field=URL "(?\w+.\w+)\/" | table domain _raw

0 Karma

ccsfdave
Builder

Yup you got it!

| rex field=url "(?<domain>\w+\.\w+)\/"
0 Karma

Rhin0Crash
Path Finder
 search | rex field=_raw "(?<domain>\w+)\.(com|net|gov|edu|co)"

I think

You can replace the field with what field the PA gives you for URL. That might be URL, or misc, or uri.
0 Karma
Get Updates on the Splunk Community!

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Index This | What goes away as soon as you talk about it?

May 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...