Getting Data In

Top-Level Domain Extraction (from URLs)

dsmeerkat
Explorer

So I've searched and searched and can't find a regex that quite fits what I want to do...What I'd like to do is extract just the ".com", ".net", ".org", etc from a URL.
My "domain" field shows: "http://cdn.springserve.com" or "https://www.allpennystocks.org", etc (for example).

I also get "www.familylifeins.com/Resources/Shared/scripts/widgets.js" sometimes in the domain field and of course I want to drop everything but the ".com"

What I need is just the top-level domain (".com", ".net", ".org", etc), and I've tried several different regex's I found here, but they don't quite work the way I need it to.

Basically I want to create a list of all the TLDs my company uses in a 90 day period.

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Try this regex string.

(?<TLD>\.\w+?)(?:$|\/)
---
If this reply helps you, Karma would be appreciated.

View solution in original post

dsmeerkat
Explorer

Thank you everyone...VERY much!

0 Karma

woodcock
Esteemed Legend

Don't forget to click Accept on the best answer (for you) and upvote anything else that was helpful.

0 Karma

woodcock
Esteemed Legend

richgalloway
SplunkTrust
SplunkTrust

Try this regex string.

(?<TLD>\.\w+?)(?:$|\/)
---
If this reply helps you, Karma would be appreciated.

rjthibod
Champion

Great minds think alike 😉

0 Karma
Get Updates on the Splunk Community!

Archived Metrics Now Available for APAC and EMEA realms

We’re excited to announce the launch of Archived Metrics in Splunk Infrastructure Monitoring for our customers ...

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Dashboard Challenge and Watch the .conf24 Global Broadcast!

The Splunk Community Dashboard Challenge is still happening, and it's not too late to enter for the week of ...