I'd like to shorten a URL collected from bluecoat logs so that it only lists the primary domain name.
For example:
abcvod.abcnews.com to just abcnews.com
or
anything.google.com to just google.com
I've searched the previous questions and I've not found any working options.
Here's a crude, non-RegeEx way to do it:
| makeresults | eval domain="e.f.com" | eval parts=split(domain,"."), c=mvcount(parts) | eval last2=mvindex(parts, c-2).".".mvindex(parts, c-1)
In RegEx, you can simply anchor to the end of the full domain name string, no? Like so:
| makeresults | eval domain="e.f.com" | rex field=domain "(?<last2>\w+\.\w+)$"
Probably needs some work to cover cases where there are non-word characters in the domain name, but the principle should apply.
Im assuming you mean extract this at search time, as opposed to change this as its indexed via transforms..
Have you checked out this Answers Post : https://answers.splunk.com/answers/542835/top-level-domain-extraction-from-urls.html
There's also a few links in there to some apps on Splunkbase that could assist in further domain analysis also.
That's basically what I need. I'm not up to speed on Regex though, and I need to take it one . further up the FQDN.
Instead of tracking the .com's as suggested, I want the facebook.com, etc
This link goes the opposite way, and does closer to what I need.
https://answers.splunk.com/answers/523064/eval-regex-for-host-name-from-fqdn.html
This does what I need -
eval hostname=replace(hostname,"^([^.]+).+","\1")
But it is the very first part of the FQDN. So i can get the start, or the end. What I need though is facebook.com, cnn.com, etc