Getting Data In

Shorten a URL to it's Primary Domain Name from Bluecoat Logs

Engager

I'd like to shorten a URL collected from bluecoat logs so that it only lists the primary domain name.

For example:

abcvod.abcnews.com to just abcnews.com

or

anything.google.com to just google.com

I've searched the previous questions and I've not found any working options.

0 Karma

Splunk Employee
Splunk Employee

Here's a crude, non-RegeEx way to do it:

| makeresults | eval domain="e.f.com" | eval parts=split(domain,"."), c=mvcount(parts) | eval last2=mvindex(parts, c-2).".".mvindex(parts, c-1)

In RegEx, you can simply anchor to the end of the full domain name string, no? Like so:

| makeresults | eval domain="e.f.com" | rex field=domain "(?<last2>\w+\.\w+)$"

Probably needs some work to cover cases where there are non-word characters in the domain name, but the principle should apply.

0 Karma

Splunk Employee
Splunk Employee

Im assuming you mean extract this at search time, as opposed to change this as its indexed via transforms..

Have you checked out this Answers Post : https://answers.splunk.com/answers/542835/top-level-domain-extraction-from-urls.html

There's also a few links in there to some apps on Splunkbase that could assist in further domain analysis also.

Engager

That's basically what I need. I'm not up to speed on Regex though, and I need to take it one . further up the FQDN.

Instead of tracking the .com's as suggested, I want the facebook.com, etc

0 Karma

Engager

This link goes the opposite way, and does closer to what I need.

https://answers.splunk.com/answers/523064/eval-regex-for-host-name-from-fqdn.html

This does what I need -

eval hostname=replace(hostname,"^([^.]+).+","\1")

But it is the very first part of the FQDN. So i can get the start, or the end. What I need though is facebook.com, cnn.com, etc

0 Karma