topic Re: Extracting domain name out of a url in Splunk Search

Extracting domain name out of a url

imarks004 — Sun, 17 Oct 2010 04:12:59 GMT

I am trying to field extraction working for just domains accessed on my Ironport WSAs but am having an issue extracting just the domain piece out of a url.
For example, if I do a search by top s_hostname I get the following: 0.4.channel.facebook.com 0.52.channel.facebook.com 0.57.channel.facebook.com 0.chstatic.cvcdn.com 0.gvt0.com 0.media.dorkly.cvcdn.com 0.media.todaysbigthing.cvcdn.com 0.r.msn.com 0.tqn.com 0.track.ning.com I am trying to get a regex working to strip everything out to the left of the domain name so I would only see facebook.com and not 0.4.channel.facebook.com. I am not having any luck coming up with a regex to handle this.

Re: Extracting domain name out of a url

southeringtonp — Sun, 17 Oct 2010 04:23:14 GMT

Assuming you always want only two levels:

| rex field=s_hostname "\.(?<s_domainname>\S+\.\S+)$"

Re: Extracting domain name out of a url

gkanapathy — Sun, 17 Oct 2010 22:31:31 GMT

Probably a more efficient regex is: (?<s_domainname>[^\.\s]+\.[^\.\s]+)$ instead.

Re: Extracting domain name out of a url

jrodriguezap — Wed, 05 Mar 2014 20:03:07 GMT

This can also be even more efficient (if either com.br, com.pe, com.jo):

(?<_hostname>(\d{1,3}.\d{1,3}?|[^\.\s]+?)\.([^\.\s]{1,3}|[^\.\s]{1,3}\.[^\.\s]{1,3}))$

Re: Extracting domain name out of a url

stanleyglover — Fri, 13 Feb 2015 13:58:07 GMT

Extraction can easily be done by some simple steps given at http://www.perlmonks.org/?node_id=670802. Various formulas are also available that can easily extract domain name from the URL using Regex who’s examples you can see at above site too. After reading if still some query remains unsolved feel free to ask..

Re: Extracting domain name out of a url

tpflicke — Fri, 13 Feb 2015 18:58:11 GMT

To deal with all the various examples in this thread and all other possible cases such as new domains like .london, I think it will need something more than a reasonably short regex line.

I would probably go down the route of calling a Python script to deal with the cases to my satisfaction and being able to lay out the logic in a maintainable way. Maybe there is a splunk app or add-on that provides such functionality, if not, it could make a nice exercise to create one.

A few test cases:

conductor.io.com => io.com
support.expedia.co.uk => expedia.co.uk
0.52.channel.facebook.com => facebook.com
0.52.channel.facebook.london => facebook.london

Re: Extracting domain name out of a url

mIliofotou_splu — Mon, 29 Aug 2016 01:31:43 GMT

I don think this works any more ...

Re: Extracting domain name out of a url

GeekMikeGrace — Fri, 12 May 2017 16:47:21 GMT

I ended up going with

\/\/(?:[^@\/\n]+@)?(?:www\.)?(?<refdomain>[^:\/\n]+)

Used it context it looks like

method=GET| rex field=referer "\/\/(?:[^@\/\n]+@)?(?:www\.)?(?<refdomain>[^:\/\n]+)"| stats values(refdomain)

See the extraction in action https://regex101.com/r/iVrIlL/1

Re: Extracting domain name out of a url

fwijnholds_splu — Fri, 12 May 2017 19:48:01 GMT

There's an App for that! The URL toolbox is my absolute fav but maybe URL Parse already does the trick?

Your SPL would look like this:

`method=GET| ut_parse(referer)`

Make sure you use the back tick so Splunk knows you are calling a macro.

Re: Extracting domain name out of a url

dariusdamalakas — Thu, 25 May 2017 10:58:04 GMT

I downvoted this post because does not work anymore. n

Re: Extracting domain name out of a url

dariusdamalakas — Thu, 25 May 2017 10:58:54 GMT

Takes everything up until 3rd slash

rex field=Uri "^(?[^/]/[^/]/[^/]*)"

Re: Extracting domain name out of a url

mstephenson716 — Mon, 01 Jul 2019 15:52:40 GMT

This worked for me.