topic Re: Shorten a URL to it's Primary Domain Name from Bluecoat Logs in Getting Data In

Shorten a URL to it's Primary Domain Name from Bluecoat Logs

john5916 — Tue, 17 Oct 2017 16:11:47 GMT

I'd like to shorten a URL collected from bluecoat logs so that it only lists the primary domain name.

For example:

abcvod.abcnews.com to just abcnews.com

anything.google.com to just google.com

I've searched the previous questions and I've not found any working options.

esix_splunk — Tue, 17 Oct 2017 18:37:25 GMT

Im assuming you mean extract this at search time, as opposed to change this as its indexed via transforms..

There's also a few links in there to some apps on Splunkbase that could assist in further domain analysis also.

john5916 — Tue, 17 Oct 2017 18:42:11 GMT

That's basically what I need. I'm not up to speed on Regex though, and I need to take it one . further up the FQDN.

Instead of tracking the .com's as suggested, I want the facebook.com, etc

john5916 — Tue, 17 Oct 2017 19:00:41 GMT

This link goes the opposite way, and does closer to what I need.

This does what I need -

eval hostname=replace(hostname,"^([^.]+).+","\1")

But it is the very first part of the FQDN. So i can get the start, or the end. What I need though is facebook.com, cnn.com, etc

s2_splunk — Tue, 17 Oct 2017 20:40:58 GMT

Here's a crude, non-RegeEx way to do it:

| makeresults | eval domain="e.f.com" | eval parts=split(domain,"."), c=mvcount(parts) | eval last2=mvindex(parts, c-2).".".mvindex(parts, c-1)

In RegEx, you can simply anchor to the end of the full domain name string, no? Like so:

| makeresults | eval domain="e.f.com" | rex field=domain "(?<last2>\w+\.\w+)$"

Probably needs some work to cover cases where there are non-word characters in the domain name, but the principle should apply.