<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extracting fields from an existing Field in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125433#M33901</link>
    <description>&lt;P&gt;A rather theoretical comment on that - if you truly want to capture every imaginable URI scheme, using \w+ isn't going to catch them all. There are more or less obscure schemes with dots and dashes in them.&lt;/P&gt;</description>
    <pubDate>Mon, 27 Jan 2014 20:32:39 GMT</pubDate>
    <dc:creator>martin_mueller</dc:creator>
    <dc:date>2014-01-27T20:32:39Z</dc:date>
    <item>
      <title>Extracting fields from an existing Field</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125428#M33896</link>
      <description>&lt;P&gt;I am working on some http_referer analysis from my proxy logs, seems like an interesting thing to do. I want to do an additional search time field extraction and rip apart the http_referer field to provide more search functionality from the data.&lt;/P&gt;

&lt;P&gt;Can I do something like:&lt;/P&gt;

&lt;P&gt;transforms.conf:&lt;BR /&gt;
REGEX = field=http_referrer ^(?&lt;HTTP_REFERER_SCHEME&gt;\w+)://&lt;/HTTP_REFERER_SCHEME&gt;&lt;/P&gt;

&lt;P&gt;*Yes, I realize my field name isn't the same as the RFC... haha, official misspelling &lt;span class="lia-unicode-emoji" title=":confused_face:"&gt;😕&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;I can build the whole thing out with a single line, and I am sure the hardware can handle the overhead without issue (I hope), but I'd rather have field anchor of some sort to go off of.&lt;/P&gt;

&lt;P&gt;Am I missing something on this?&lt;/P&gt;

&lt;P&gt;After thoughts: I can do a content match on the :// as there is nothing in the logs that should contain that combination of characters in ASCII, any colons in the URI will be in hex or something else.&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 15:43:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125428#M33896</guid>
      <dc:creator>psheck117</dc:creator>
      <dc:date>2020-09-28T15:43:11Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting fields from an existing Field</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125429#M33897</link>
      <description>&lt;P&gt;I believe you're looking for the SOURCE_KEY setting in transforms.conf, see &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Admin/transformsconf"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Admin/transformsconf&lt;/A&gt; for details.&lt;/P&gt;

&lt;P&gt;As for building a regex to match on "something ending with ://", that will work but not be a pinnacle of efficiency. The automaton working to match the regex will constantly try to start, walk along, and then fail repeatedly - much like running a Splunk search using key=*value. It's much faster to have quick failures by anchoring the start to something.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2014 23:46:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125429#M33897</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-01-24T23:46:30Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting fields from an existing Field</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125430#M33898</link>
      <description>&lt;P&gt;Thanks Martin! I will check out &amp;amp; use SOURCE_KEY, I knew I was missing something.&lt;/P&gt;

&lt;P&gt;As for my regex, definitely not going to end on ://. Though, there is only one place in the event that will exist, http:// or https:// in the referrer field, if it exists at all. I didn't want to put my whole regex into the question, so left at the first extracted field.&lt;/P&gt;

&lt;P&gt;Thanks again!&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2014 23:57:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125430#M33898</guid>
      <dc:creator>psheck117</dc:creator>
      <dc:date>2014-01-24T23:57:10Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting fields from an existing Field</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125431#M33899</link>
      <description>&lt;P&gt;If you know you're only going to encounter http and https, consider using https? as your regex... it'll at least help someone read it later.&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jan 2014 00:02:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125431#M33899</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-01-25T00:02:47Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting fields from an existing Field</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125432#M33900</link>
      <description>&lt;P&gt;Here is the full regex for my http_referer extraction. If you do something like this you may be surprised with what shows up as a referrer scheme.&lt;/P&gt;

&lt;P&gt;REGEX = (?&lt;HTTP_REFERER_SCHEME&gt;\w+)://(?&lt;HTTP_REFERER_DEST_HOST&gt;\S[^/]+)((?&lt;HTTP_REFERER_URI_PATH&gt;/.[^?]+))?((?&lt;HTTP_REFERER_URI_QUERY&gt;\?.*))?&lt;/HTTP_REFERER_URI_QUERY&gt;&lt;/HTTP_REFERER_URI_PATH&gt;&lt;/HTTP_REFERER_DEST_HOST&gt;&lt;/HTTP_REFERER_SCHEME&gt;&lt;/P&gt;

&lt;P&gt;I could probably get into the depth of http_referer_uri_extension, but that is hit or miss, and right now I am not sure I need the detail. Though, thinking about it, I could slip it in there.&lt;/P&gt;

&lt;P&gt;My first inclination was to break it out into multiple extractions too.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 15:43:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125432#M33900</guid>
      <dc:creator>psheck117</dc:creator>
      <dc:date>2020-09-28T15:43:46Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting fields from an existing Field</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125433#M33901</link>
      <description>&lt;P&gt;A rather theoretical comment on that - if you truly want to capture every imaginable URI scheme, using \w+ isn't going to catch them all. There are more or less obscure schemes with dots and dashes in them.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jan 2014 20:32:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125433#M33901</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-01-27T20:32:39Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting fields from an existing Field</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125434#M33902</link>
      <description>&lt;P&gt;Yeah, I realized that after I committed my transform... reading rfc1945 has been enlightening to say the least. Here is a crack at a proper REGEX for scheme, I will comment and add the http_referer_uri_extension after testing.&lt;/P&gt;

&lt;P&gt;REGEX = (?&lt;HTTP_REFERER_SCHEME&gt;[a-zA-Z+.-]+)://(?&lt;HTTP_REFERER_DEST_HOST&gt;S[^/]+)((?&lt;HTTP_REFERER_URI_PATH&gt;/.[^?]+))?((?&lt;HTTP_REFERER_URI_QUERY&gt;?.*))?&lt;/HTTP_REFERER_URI_QUERY&gt;&lt;/HTTP_REFERER_URI_PATH&gt;&lt;/HTTP_REFERER_DEST_HOST&gt;&lt;/HTTP_REFERER_SCHEME&gt;&lt;/P&gt;

&lt;P&gt;Ha! Looking at my regex makes me question if I can tighten it a little better too.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 15:44:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-fields-from-an-existing-Field/m-p/125434#M33902</guid>
      <dc:creator>psheck117</dc:creator>
      <dc:date>2020-09-28T15:44:04Z</dc:date>
    </item>
  </channel>
</rss>

