<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extract domain from full url in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660263#M227913</link>
    <description>&lt;LI-CODE lang="markup"&gt;| eval my_url=mvjoin(mvindex(split(url,"."),-2,-1),".")&lt;/LI-CODE&gt;</description>
    <pubDate>Tue, 10 Oct 2023 17:13:44 GMT</pubDate>
    <dc:creator>ITWhisperer</dc:creator>
    <dc:date>2023-10-10T17:13:44Z</dc:date>
    <item>
      <title>Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660260#M227911</link>
      <description>&lt;P&gt;I'm working with data from this search&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3) 
    [ | inputlookup my_list_of_urls.csv ]
| rename url AS my_url
| stats count by my_url
| table my_url&lt;/LI-CODE&gt;&lt;P&gt;The events look like this&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;02ef65dc96524dabba54a950da7cb0d8.fp.measure.office.com/
0434c399ca884247875a286a10c969f4.fp.measure.office.com/
14f8c4d0e9b7be86933be5d3c9fb91d7.fp.measure.office.com/
3d8e055913534ff7b3c23101fd1f3ca6.fp.measure.office.com/
4334ede7832f44c5badcfd5a6459a1a2.fp.measure.office.com/
5d44dec60c9b4788fb26426c1e151f46.fp.measure.office.com/
5f021e1b8d3646398fab8ce59f8a6bbd.fp.measure.office.com/
6f6c23c1671f72c36d6179fdeabd1f56.fp.measure.office.com/
7106ea87c1e2ed0aebc9baca86f9af34.fp.measure.office.com/
88c88084fe454cbc8629332c6422e8a4.fp.measure.office.com/
982db5012df7494a88c242d426e07be6.fp.measure.office.com/
a478076af2deaf28abcbe5ceb8bdb648.fp.measure.office.com/
aad.cs.dds.microsoft.com/&lt;/LI-CODE&gt;&lt;P&gt;In the my_list_of_urls.csv there are these entries&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;*.microsoft.com/
microsoft.com/
*.office.com/
office.com/&lt;/LI-CODE&gt;&lt;P&gt;What I'm trying to do is get the microsoft.com and office.com from the results instead of the full url.&amp;nbsp; I'm stumped on how to do it.&amp;nbsp; Any help is appreciated.&lt;BR /&gt;&lt;BR /&gt;TIA,&lt;/P&gt;&lt;P&gt;Joe&lt;/P&gt;</description>
      <pubDate>Tue, 10 Oct 2023 16:51:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660260#M227911</guid>
      <dc:creator>jwhughes58</dc:creator>
      <dc:date>2023-10-10T16:51:00Z</dc:date>
    </item>
    <item>
      <title>Re: Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660263#M227913</link>
      <description>&lt;LI-CODE lang="markup"&gt;| eval my_url=mvjoin(mvindex(split(url,"."),-2,-1),".")&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 10 Oct 2023 17:13:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660263#M227913</guid>
      <dc:creator>ITWhisperer</dc:creator>
      <dc:date>2023-10-10T17:13:44Z</dc:date>
    </item>
    <item>
      <title>Re: Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660264#M227914</link>
      <description>&lt;P&gt;See if this helps.&amp;nbsp; It replaces the my_url field with a string depending on which regular expression matches the data.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3) 
    [ | inputlookup my_list_of_urls.csv ]
| rename url AS my_url
| eval my_url=case(match(my_url,".*/microsoft\.com$"), "microsoft.com",
                   match(my_url,".*/office\.com$"), "office.com",
                   1==1, my_url)
| stats count by my_url
| table my_url&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 10 Oct 2023 17:15:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660264#M227914</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2023-10-10T17:15:46Z</dc:date>
    </item>
    <item>
      <title>Re: Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660271#M227919</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/213957"&gt;@richgalloway&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Unfortunately the csv file has 1156 entries so the use case would be huge.&amp;nbsp; The other issue is that the csv file gets updated daily and urls are added and removed.&amp;nbsp; I was hoping to use the csv file to say get only this part out of the result urls.&amp;nbsp; Alternatively some way of joining the results url with the csv file urls.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Joe&lt;/P&gt;</description>
      <pubDate>Tue, 10 Oct 2023 17:40:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660271#M227919</guid>
      <dc:creator>jwhughes58</dc:creator>
      <dc:date>2023-10-10T17:40:00Z</dc:date>
    </item>
    <item>
      <title>Re: Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660272#M227920</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/225168"&gt;@ITWhisperer&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;If the urls are consistent, that is a great idea.&amp;nbsp; Unfortunately, the urls have between 1 and 8 parts between . and I don't know where to start.&amp;nbsp; Alternatively, if there is a way of adding what is in the csv to the results, that would work.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Joe&lt;/P&gt;</description>
      <pubDate>Tue, 10 Oct 2023 17:38:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660272#M227920</guid>
      <dc:creator>jwhughes58</dc:creator>
      <dc:date>2023-10-10T17:38:46Z</dc:date>
    </item>
    <item>
      <title>Re: Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660322#M227934</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;If the urls are consistent, that is a great idea.&amp;nbsp; Unfortunately, the urls have between 1 and 8 parts&amp;nbsp;&lt;/BLOCKQUOTE&gt;&lt;P&gt;This is rather confusing. &amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/225168"&gt;@ITWhisperer&lt;/a&gt;'s command concatenates from the end, and it shouldn't matter whether there is 1 part in between or 8. &amp;nbsp;Have you tested in full search?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3) 
    [ | inputlookup my_list_of_urls.csv ]
| eval my_url = mvjoin(mvindex(split(url, "."), -2,-1), ".")
| stats count by my_url&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;between . and I don't know where to start.&amp;nbsp; Alternatively, if there is a way of adding what is in the csv to the results, that would work.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Another piece of confusing comes from the original search:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3) 
    [ | inputlookup my_list_of_urls.csv ]
| rename url AS my_url
| stats count by my_url
| table my_url&lt;/LI-CODE&gt;&lt;P&gt;If you table my_url in the end, the result is no different from using inputlookup alone. &amp;nbsp;Why bother with index search? &amp;nbsp;If you only want mircrosoft.com and office.com from the CSV, you could do&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| inputlookup my_list_of_urls.csv
| eval my_url = mvjoin(mvindex(split(url, "."), -2,-1), ".")
| stats values(my_url) as my_url
| mvexpand my_url&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Maybe you can describe more variety of data (anonymize as needed) and print output from proposed searches, then illustrate what desired outcome is and explain why the actual output is not desired (in case it is not obvious enough)?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Oct 2023 22:33:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660322#M227934</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2023-10-10T22:33:39Z</dc:date>
    </item>
    <item>
      <title>Re: Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660406#M227980</link>
      <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/33901"&gt;@yuanliu&lt;/a&gt;Yeah, it is a pain of a search.&amp;nbsp; Here is the issue.&amp;nbsp; A firewall device generates an event with URL when certain policies are triggered by contractors.&amp;nbsp; That is the initial search.&amp;nbsp; The firewall team has a list of the URLs the contractors have access to which is the csv file.&amp;nbsp; The firewall team wants to remove any URLs that aren't used in a period of time.&amp;nbsp; Thus, I have to compare the firewall URLs to the csv URLs and output any csv URLs that aren't used in the time frame.&amp;nbsp; The search finds the firewall events.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3) 
    [ | inputlookup my_list_of_urls.csv ]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My issue is the firewall events use the long URL and not the short one.&amp;nbsp; From the firewall&lt;/P&gt;&lt;PRE&gt;a478076af2deaf28abcbe5ceb8bdb648.fp.measure.office.com/
aad.cs.dds.microsoft.com/&lt;/PRE&gt;&lt;P&gt;From the csv file&lt;/P&gt;&lt;PRE&gt;*.microsoft.com/
microsoft.com/
*.office.com/
office.com/&lt;/PRE&gt;&lt;P&gt;The two events from the firewall mean that the two listed in the csv file are still good and don't need to be removed.&amp;nbsp; I try to think of this as two sets, one the firewall results and the other the csv file, but I can't figure out how to search the firewall results with what is in the csv file.&lt;/P&gt;&lt;P&gt;This make sense?&lt;/P&gt;&lt;P&gt;TIA,&lt;/P&gt;&lt;P&gt;Joe&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2023 15:18:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660406#M227980</guid>
      <dc:creator>jwhughes58</dc:creator>
      <dc:date>2023-10-11T15:18:59Z</dc:date>
    </item>
    <item>
      <title>Re: Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660435#M227992</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;list of the URLs the contractors have access to which is the csv file.&amp;nbsp; The firewall team wants to remove any URLs that aren't used in a period of time.&amp;nbsp; Thus, I have to compare the firewall URLs to the csv&amp;nbsp;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;So, the firewall team wants to update that CSV file so it will not contain entries that haven't had matching events for a given time period. &amp;nbsp;Is this correct? &amp;nbsp;This seems to be the opposite of what the Splunk search is doing.&lt;/P&gt;&lt;P&gt;Some more points you need to clarify.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;What are field name(s) the index search and the lookup file use to indicate URLs? &amp;nbsp;Based on your code snippet, I assume that they both use &lt;U&gt;url&lt;/U&gt;.&lt;/LI&gt;&lt;LI&gt;Does the CSV file contain additional fields? &amp;nbsp;Based on your code snippet, I will assume none.&lt;/LI&gt;&lt;LI&gt;Is there some significance of trailing slash (/)? &amp;nbsp;Do all url values end with one trailing slash? &amp;nbsp;This may not be relevant, but some SPL manipulations may ruin your convention. &amp;nbsp;So, I'd like to be cautious.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;A more important question is the use of asterisk (*). &amp;nbsp;Are the last two domains (root and second level) the only parts of interest? &amp;nbsp;Given all the illustrations, I have to assume yes. &amp;nbsp;In other words, no differentiation is needed between&amp;nbsp;*.microsoft.com/ and&amp;nbsp;microsoft.com/. &amp;nbsp;Additionally, I will assume that every url in the CSV needs to be paired with a wildcard entry.&lt;/P&gt;&lt;P&gt;Using the above assumptions, the following can show you second level domains that have not been used.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3) 
| eval url = mvjoin(mvindex(split(url, "."), -2,-1), ".")
| dedup url
| inputlookup append=true my_list_of_urls.csv
| fillnull sourcetype value=CSV
| stats values(sourcetype) as sourcetype by url
| where mvcount(sourcetype) == 1 AND sourcetype == "CSV"
| eval url = mvappend(url, "*." . url)
| mvexpand url&lt;/LI-CODE&gt;&lt;P&gt;The output contains a list of second level domains affixed with a trailing slash, and these same strings prefixed with "*.". &amp;nbsp;These would be the ones to be removed.&lt;/P&gt;&lt;P&gt;If you have lots of events with URLs that have no match in the CSV, you can also use the subsearch as a filter to improve efficiency. &amp;nbsp;Like&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=my_index sourcetype=my_sourcetype (rule=policy_1 OR rule=policy_2 OR rule=policy_3) 
    [ | inputlookup my_list_of_urls.csv ]
| eval url = mvjoin(mvindex(split(url, "."), -2,-1), ".")
| dedup url
| inputlookup append=true my_list_of_urls.csv
| fillnull sourcetype value=CSV
| stats values(sourcetype) as sourcetype by url
| where mvcount(sourcetype) == 1 AND sourcetype == "CSV"
| eval url = mvappend(url, "*." . url)
| mvexpand url&lt;/LI-CODE&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2023 17:57:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660435#M227992</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2023-10-11T17:57:11Z</dc:date>
    </item>
    <item>
      <title>Re: Extract domain from full url</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660445#M227998</link>
      <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/33901"&gt;@yuanliu&lt;/a&gt;Thanks.&amp;nbsp; I would have never figured out the mvjoin(mvindex.&amp;nbsp; That is something I don't use.&amp;nbsp; You gave me enough help that I was able to work out something I can give to another team.&amp;nbsp; Karma point awarded.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2023 19:14:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extract-domain-from-full-url/m-p/660445#M227998</guid>
      <dc:creator>jwhughes58</dc:creator>
      <dc:date>2023-10-11T19:14:49Z</dc:date>
    </item>
  </channel>
</rss>

