<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Splunk doesn't parse this URL fully.... in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148027#M30118</link>
    <description>&lt;P&gt;Hello My dear Splunker!,&lt;/P&gt;

&lt;P&gt;I was trying to get data via syslog into Splunk, the events consists of a request="url" field like below:&lt;/P&gt;

&lt;P&gt;request=&lt;A href="http://www.terracotta.org/kit/reflector?kitID=ehcache.default&amp;amp;pageID=update.properties&amp;amp;id=2130706433&amp;amp;os-name=Linux&amp;amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;amp;jvm-version=1.7.0_55&amp;amp;platform=amd64&amp;amp;tc-version=2.6.2&amp;amp;tc-product=Ehcache+Core+2.6.2&amp;amp;source=Ehcache+Core&amp;amp;uptime-secs=1&amp;amp;patch=UNKNOWN"&gt;http://www.terracotta.org/kit/reflector?kitID=ehcache.default&amp;amp;pageID=update.properties&amp;amp;id=2130706433&amp;amp;os-name=Linux&amp;amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;amp;jvm-version=1.7.0_55&amp;amp;platform=amd64&amp;amp;tc-version=2.6.2&amp;amp;tc-product=Ehcache+Core+2.6.2&amp;amp;source=Ehcache+Core&amp;amp;uptime-secs=1&amp;amp;patch=UNKNOWN&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;But Splunk parses it like this:&lt;BR /&gt;
request=&lt;A href="http://www.terracotta.org/kit/reflector?kitID=ehcache.default"&gt;http://www.terracotta.org/kit/reflector?kitID=ehcache.default&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Can someone help me with this please?&lt;BR /&gt;
How can I get the full URL parsed correctly?&lt;BR /&gt;
And where can I go in Splunk to tweak this field? As my data is already parsed...&lt;/P&gt;

&lt;P&gt;Appreciate the help!!&lt;BR /&gt;
Thanks&lt;BR /&gt;
Sunita&lt;/P&gt;</description>
    <pubDate>Fri, 20 Feb 2015 18:06:14 GMT</pubDate>
    <dc:creator>sunitachan</dc:creator>
    <dc:date>2015-02-20T18:06:14Z</dc:date>
    <item>
      <title>Splunk doesn't parse this URL fully....</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148027#M30118</link>
      <description>&lt;P&gt;Hello My dear Splunker!,&lt;/P&gt;

&lt;P&gt;I was trying to get data via syslog into Splunk, the events consists of a request="url" field like below:&lt;/P&gt;

&lt;P&gt;request=&lt;A href="http://www.terracotta.org/kit/reflector?kitID=ehcache.default&amp;amp;pageID=update.properties&amp;amp;id=2130706433&amp;amp;os-name=Linux&amp;amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;amp;jvm-version=1.7.0_55&amp;amp;platform=amd64&amp;amp;tc-version=2.6.2&amp;amp;tc-product=Ehcache+Core+2.6.2&amp;amp;source=Ehcache+Core&amp;amp;uptime-secs=1&amp;amp;patch=UNKNOWN"&gt;http://www.terracotta.org/kit/reflector?kitID=ehcache.default&amp;amp;pageID=update.properties&amp;amp;id=2130706433&amp;amp;os-name=Linux&amp;amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;amp;jvm-version=1.7.0_55&amp;amp;platform=amd64&amp;amp;tc-version=2.6.2&amp;amp;tc-product=Ehcache+Core+2.6.2&amp;amp;source=Ehcache+Core&amp;amp;uptime-secs=1&amp;amp;patch=UNKNOWN&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;But Splunk parses it like this:&lt;BR /&gt;
request=&lt;A href="http://www.terracotta.org/kit/reflector?kitID=ehcache.default"&gt;http://www.terracotta.org/kit/reflector?kitID=ehcache.default&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Can someone help me with this please?&lt;BR /&gt;
How can I get the full URL parsed correctly?&lt;BR /&gt;
And where can I go in Splunk to tweak this field? As my data is already parsed...&lt;/P&gt;

&lt;P&gt;Appreciate the help!!&lt;BR /&gt;
Thanks&lt;BR /&gt;
Sunita&lt;/P&gt;</description>
      <pubDate>Fri, 20 Feb 2015 18:06:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148027#M30118</guid>
      <dc:creator>sunitachan</dc:creator>
      <dc:date>2015-02-20T18:06:14Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk doesn't parse this URL fully....</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148028#M30119</link>
      <description>&lt;P&gt;Hello all,&lt;BR /&gt;
I actually used the built in field extraction tool to parse this particular field, but the issue now I see is that the field extraction is applied to all other URLs which are not this long. So I have:&lt;BR /&gt;
URL&lt;BR /&gt;
URL2&lt;/P&gt;

&lt;P&gt;I want to only apply this field extraction to URL2..&lt;/P&gt;

&lt;P&gt;Any suggestion please?&lt;BR /&gt;
Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 20 Feb 2015 18:34:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148028#M30119</guid>
      <dc:creator>sunitachan</dc:creator>
      <dc:date>2015-02-20T18:34:14Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk doesn't parse this URL fully....</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148029#M30120</link>
      <description>&lt;P&gt;Could you provide some sample full events and also definition of your URL2 field extraction?&lt;/P&gt;</description>
      <pubDate>Fri, 20 Feb 2015 20:37:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148029#M30120</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2015-02-20T20:37:36Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk doesn't parse this URL fully....</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148030#M30121</link>
      <description>&lt;P&gt;Hi there,&lt;BR /&gt;
here are few samples,&lt;/P&gt;

&lt;P&gt;Feb&lt;BR /&gt;
20&lt;BR /&gt;
09:25:27 |1.0.3|0|passed|0|src=x.x.x.x&lt;BR /&gt;
spt=40960&lt;BR /&gt;
dst=34.23.12.3&lt;BR /&gt;
dpt=80&lt;BR /&gt;
deviceDirection=1&lt;BR /&gt;
request=&lt;A href="http://www.unikin.cd/" target="_blank"&gt;http://www.unikin.cd/&lt;/A&gt;&lt;BR /&gt;
act=passed&lt;BR /&gt;
cn1Label=Risk_Score&lt;BR /&gt;
cn1=0&lt;BR /&gt;
cs5=-&lt;BR /&gt;
cs5Label=Malware_Type&lt;BR /&gt;
cs1=-&lt;BR /&gt;
cs1Label=Category&lt;BR /&gt;
cs2=-&lt;BR /&gt;
cs2Label=Protocol&lt;/P&gt;

&lt;P&gt;Feb&lt;BR /&gt;
20 09:25:27|1.0.3|0|passed|0|src=x.x.x.x&lt;BR /&gt;
spt=60657 &lt;BR /&gt;
dst=291.98.1.1&lt;BR /&gt;
dpt=80 &lt;BR /&gt;
deviceDirection=1 &lt;BR /&gt;
request=&lt;A href="http://mobile.orange.fr/" target="_blank"&gt;http://mobile.orange.fr/&lt;/A&gt;&lt;BR /&gt;
act=passed &lt;BR /&gt;
cn1Label=Risk_Score&lt;BR /&gt;
cn1=0 &lt;BR /&gt;
cs5=- cs5Label=Malware_Type&lt;BR /&gt;
cs1=- &lt;BR /&gt;
cs1Label=Category &lt;BR /&gt;
cs2=- cs2Label=Protocol&lt;/P&gt;

&lt;P&gt;Feb&lt;BR /&gt;
16 08:46:11|1.0.3|0|passed|0|src=x.x.x.x&lt;BR /&gt;
spt=55845 &lt;BR /&gt;
dst=199.11.1.1&lt;BR /&gt;
dpt=80 &lt;BR /&gt;
deviceDirection=1 &lt;BR /&gt;
request=&lt;A href="http://www.terracotta.org/kit/reflector?kitID=ehcache.default&amp;amp;pageID=update.properties&amp;amp;id=2130706433&amp;amp;os-name=Linux&amp;amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;amp;jvm-version=1.7.0_55&amp;amp;platform=amd64&amp;amp;tc-version=2.6.2&amp;amp;tc-product=Ehcache+Core+2.6.2&amp;amp;source=Ehcache+Core&amp;amp;uptime-secs=1&amp;amp;patch=UNKNOWN" target="_blank"&gt;http://www.terracotta.org/kit/reflector?kitID=ehcache.default&amp;amp;pageID=update.properties&amp;amp;id=2130706433&amp;amp;os-name=Linux&amp;amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;amp;jvm-version=1.7.0_55&amp;amp;platform=amd64&amp;amp;tc-version=2.6.2&amp;amp;tc-product=Ehcache+Core+2.6.2&amp;amp;source=Ehcache+Core&amp;amp;uptime-secs=1&amp;amp;patch=UNKNOWN&lt;/A&gt;&lt;BR /&gt;
act=passed &lt;BR /&gt;
cn1Label=Risk_Score&lt;BR /&gt;
cn1=0 &lt;BR /&gt;
cs5=- cs5Label=Malware_Type&lt;BR /&gt;
cs1=- &lt;BR /&gt;
cs1Label=Category &lt;BR /&gt;
cs2=- cs2Label=Protocol&lt;/P&gt;

&lt;P&gt;And URL = request&lt;BR /&gt;
URL2 = request with long url as in the 3rd sample above&lt;/P&gt;

&lt;P&gt;Can I have just one field which could include both type of URLs?&lt;BR /&gt;
The URL2 regex is ^(?:[^=\n]*=){6}(?P[^ ]+)&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 18:59:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148030#M30121</guid>
      <dc:creator>sunitachan</dc:creator>
      <dc:date>2020-09-28T18:59:24Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk doesn't parse this URL fully....</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148031#M30122</link>
      <description>&lt;P&gt;I'm making some assumptions here...&lt;/P&gt;

&lt;P&gt;Looks like you are relying on key/value pair parsing for automatic field extraction. You probably want to use a rex command or do a field extraction for your data. Since there are no spaces in your URL you should be able to us the following regex to parse the request url:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;request=(?P&amp;lt;url&amp;gt;[^ ]+)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I'm assuming that from the samples, there is really supposed to be a space between the various fields for each event.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Feb 2015 03:47:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148031#M30122</guid>
      <dc:creator>cpetterborg</dc:creator>
      <dc:date>2015-02-23T03:47:09Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk doesn't parse this URL fully....</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148032#M30123</link>
      <description>&lt;P&gt;As CPetterborg mentions, it depends on how the event looks. Is this a space delimited event, or newline feed.. I would use something like:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;request=(?&amp;lt;url&amp;gt;[^\s|^\r\n]+)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That would capture anything followed by a space, or a unix style linefeed (that might need to be adjusted based on the sourcetype.) One potential issue with using a space as a delimiter could be that you might have a url that has a space or encoded space character in the url...&lt;/P&gt;</description>
      <pubDate>Mon, 23 Feb 2015 04:37:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-doesn-t-parse-this-URL-fully/m-p/148032#M30123</guid>
      <dc:creator>esix_splunk</dc:creator>
      <dc:date>2015-02-23T04:37:02Z</dc:date>
    </item>
  </channel>
</rss>

