<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Shorten a URL to it's Primary Domain Name from Bluecoat Logs in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315590#M59062</link>
    <description>&lt;P&gt;Here's a crude, non-RegeEx way to do it:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults | eval domain="e.f.com" | eval parts=split(domain,"."), c=mvcount(parts) | eval last2=mvindex(parts, c-2).".".mvindex(parts, c-1)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In RegEx, you can simply anchor to the end of the full domain name string, no? Like so:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults | eval domain="e.f.com" | rex field=domain "(?&amp;lt;last2&amp;gt;\w+\.\w+)$"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Probably needs some work to cover cases where there are non-word characters in the domain name, but the principle should apply.&lt;/P&gt;</description>
    <pubDate>Tue, 17 Oct 2017 20:40:58 GMT</pubDate>
    <dc:creator>s2_splunk</dc:creator>
    <dc:date>2017-10-17T20:40:58Z</dc:date>
    <item>
      <title>Shorten a URL to it's Primary Domain Name from Bluecoat Logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315586#M59058</link>
      <description>&lt;P&gt;I'd like to shorten a URL collected from bluecoat logs so that it only lists the primary domain name.&lt;/P&gt;

&lt;P&gt;For example:&lt;/P&gt;

&lt;P&gt;abcvod.abcnews.com to just abcnews.com&lt;/P&gt;

&lt;P&gt;or &lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;anything.&lt;/STRONG&gt;google.com to just google.com&lt;/P&gt;

&lt;P&gt;I've searched the previous questions and I've not found any working options.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2017 16:11:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315586#M59058</guid>
      <dc:creator>john5916</dc:creator>
      <dc:date>2017-10-17T16:11:47Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten a URL to it's Primary Domain Name from Bluecoat Logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315587#M59059</link>
      <description>&lt;P&gt;Im assuming you mean extract this at search time, as opposed to change this as its indexed via transforms..&lt;/P&gt;

&lt;P&gt;Have you checked out this Answers Post : &lt;A href="https://answers.splunk.com/answers/542835/top-level-domain-extraction-from-urls.html"&gt;https://answers.splunk.com/answers/542835/top-level-domain-extraction-from-urls.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;There's also a few links in there to some apps on Splunkbase that could assist in further domain analysis also.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2017 18:37:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315587#M59059</guid>
      <dc:creator>esix_splunk</dc:creator>
      <dc:date>2017-10-17T18:37:25Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten a URL to it's Primary Domain Name from Bluecoat Logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315588#M59060</link>
      <description>&lt;P&gt;That's basically what I need. I'm not up to speed on Regex though, and I need to take it one . further up the FQDN. &lt;/P&gt;

&lt;P&gt;Instead of tracking the .com's as suggested, I want the facebook.com, etc&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2017 18:42:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315588#M59060</guid>
      <dc:creator>john5916</dc:creator>
      <dc:date>2017-10-17T18:42:11Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten a URL to it's Primary Domain Name from Bluecoat Logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315589#M59061</link>
      <description>&lt;P&gt;This link goes the opposite way, and does closer to what I need.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://answers.splunk.com/answers/523064/eval-regex-for-host-name-from-fqdn.html"&gt;https://answers.splunk.com/answers/523064/eval-regex-for-host-name-from-fqdn.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;This does what I need -&lt;/P&gt;

&lt;P&gt;eval hostname=replace(hostname,"^([^.]+).+","\1") &lt;/P&gt;

&lt;P&gt;But it is the very first part of the FQDN. So i can get the start, or the end. What I need though is facebook.com, cnn.com, etc&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2017 19:00:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315589#M59061</guid>
      <dc:creator>john5916</dc:creator>
      <dc:date>2017-10-17T19:00:41Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten a URL to it's Primary Domain Name from Bluecoat Logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315590#M59062</link>
      <description>&lt;P&gt;Here's a crude, non-RegeEx way to do it:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults | eval domain="e.f.com" | eval parts=split(domain,"."), c=mvcount(parts) | eval last2=mvindex(parts, c-2).".".mvindex(parts, c-1)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In RegEx, you can simply anchor to the end of the full domain name string, no? Like so:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults | eval domain="e.f.com" | rex field=domain "(?&amp;lt;last2&amp;gt;\w+\.\w+)$"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Probably needs some work to cover cases where there are non-word characters in the domain name, but the principle should apply.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2017 20:40:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Shorten-a-URL-to-it-s-Primary-Domain-Name-from-Bluecoat-Logs/m-p/315590#M59062</guid>
      <dc:creator>s2_splunk</dc:creator>
      <dc:date>2017-10-17T20:40:58Z</dc:date>
    </item>
  </channel>
</rss>

