<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Bluecoat log with domain-based sorting possible? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100265#M20973</link>
    <description>&lt;P&gt;Had same problem - this worked for me...&lt;/P&gt;

&lt;P&gt;Created field extract named bcoat_proxysg: EXTRACT-cs_uri_authority with regex:&lt;/P&gt;

&lt;P&gt;(?)..*?.(?P&lt;CS_URI_AUTHORITY&gt;[a-z]+.[a-z]+(?=/)&lt;/CS_URI_AUTHORITY&gt;&lt;/P&gt;

&lt;P&gt;then changed the search/view.&lt;/P&gt;</description>
    <pubDate>Mon, 28 Sep 2020 11:36:30 GMT</pubDate>
    <dc:creator>MikeyG</dc:creator>
    <dc:date>2020-09-28T11:36:30Z</dc:date>
    <item>
      <title>Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100251#M20959</link>
      <description>&lt;P&gt;For example, I would like to group all the following URLs under google:&lt;BR /&gt;
docs.google.com,&lt;BR /&gt;
maps.google.com,&lt;BR /&gt;
&lt;A href="http://www.google.com"&gt;www.google.com&lt;/A&gt;,&lt;BR /&gt;
...&lt;BR /&gt;
(may be it is *google*)&lt;/P&gt;

&lt;P&gt;Is there a way to do it such that it will show results with pre-defined domains?&lt;BR /&gt;
I would much appreciate if such pre-defined rules already exist some where.&lt;BR /&gt;
Thank you.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Mar 2012 04:10:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100251#M20959</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2012-03-23T04:10:10Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100252#M20960</link>
      <description>&lt;P&gt;Sorry, are you talking about configuration of BlueCoat or Splunk? Not sure exactly what you want to do, though.&lt;/P&gt;

&lt;P&gt;/k&lt;/P&gt;</description>
      <pubDate>Sun, 25 Mar 2012 19:23:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100252#M20960</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2012-03-25T19:23:17Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100253#M20961</link>
      <description>&lt;P&gt;I have the log downloaded from bluecoat server and would like to import it to Splunk for log analysis. Normally, splunk will treat each line (of bluecoat log) as an event. Each event contains some fields. One of them is URL-related. I would like to group each event with similar URL characteristic (i.e. under the same domain, in the example above, google). It is because the log may be huge. Doing such grouping will reduce the size. In addition, the result (or the report) looks simpler.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 01:20:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100253#M20961</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2012-03-26T01:20:31Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100254#M20962</link>
      <description>&lt;P&gt;I am not sure if the term "grouping" is appropriate.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 01:22:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100254#M20962</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2012-03-26T01:22:16Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100255#M20963</link>
      <description>&lt;P&gt;Well, I assume that you have an extracted field for the URL (or URI), correct?&lt;/P&gt;

&lt;P&gt;That field would contain just a little too much information for your sorting/grouping purposes, right, e.g.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&lt;A href="http://www.google.com/search?q=blah" target="test_blank"&gt;http://www.google.com/search?q=blah&lt;/A&gt;
&lt;A href="https://secure.bank.co.uk/login" target="test_blank"&gt;https://secure.bank.co.uk/login&lt;/A&gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;From that field you can extract the domain part (google, bank) as a new field with a regex, either inline in the search, or more 'permanent' by editing a config file (or using the IFX).&lt;/P&gt;

&lt;P&gt;Inline, you could have a search that looks something like;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=your_bluecoat_sourcetype | rex field=URL "https?://[^\.]+\.(?XXXXXXXXX[^\.]+)\." | stats c by domain
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Aaargh - something seems to be wrong - I just cannot get HTML-specific characters too work. The XXXX should be replaced with the word "domain", enclosed in angle brackets (no quotes).&lt;/P&gt;

&lt;P&gt;The final part after the | creates a table counting events by the newly extracted 'domain' field.&lt;/P&gt;

&lt;P&gt;Hope this helps,&lt;/P&gt;

&lt;P&gt;Kristian&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 07:20:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100255#M20963</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2012-03-26T07:20:55Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100256#M20964</link>
      <description>&lt;P&gt;Thank you very much. That is exactly what I would like to archieve.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 07:24:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100256#M20964</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2012-03-26T07:24:03Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100257#M20965</link>
      <description>&lt;P&gt;Btw, I have use the IFX and it seems not good in making custom regex for URL (I am not good at regex too).&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 07:26:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100257#M20965</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2012-03-26T07:26:07Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100258#M20966</link>
      <description>&lt;P&gt;In case anyone would like to get quick answer on regex URL &lt;A href="http://gskinner.com/RegExr/"&gt;http://gskinner.com/RegExr/&lt;/A&gt; (I suppose you need some basis on regex)&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 07:29:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100258#M20966</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2012-03-26T07:29:10Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100259#M20967</link>
      <description>&lt;P&gt;Yeah, well, the IFX may have a hard time trying to find the correct regex. It isn't perfect, but you often get an idea on how to craft your own.&lt;/P&gt;

&lt;P&gt;If this answered your question, please mark as "answered" a/o upvote. Thanks, K.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 07:34:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100259#M20967</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2012-03-26T07:34:04Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100260#M20968</link>
      <description>&lt;P&gt;While i am still handling the regex stuff, there is actually a second question. &lt;/P&gt;

&lt;P&gt;For example, there are 2 lines of event&lt;BR /&gt;
maps.google.com bytes_a duration_a&lt;BR /&gt;
docs.google.com bytes_b duration_b&lt;/P&gt;

&lt;P&gt;Will it be combined as follows?&lt;BR /&gt;
google bytes_a+b duration_a+b&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 11:34:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100260#M20968</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2020-09-28T11:34:37Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100261#M20969</link>
      <description>&lt;P&gt;thanks to kristian. the question is solved.&lt;BR /&gt;
the regex i used is rex field=Url "[http|https|ftp|tcp]?\:\/\/[^\.]+\.(?&lt;DOMAIN&gt;[^\.]+).[^\.]+\/"&lt;BR /&gt;
the regex is aimed to resolve the format ://xxx.domain.xxx/ (i duno if there is any error)&lt;/DOMAIN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 07:58:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100261#M20969</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2012-03-26T07:58:04Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100262#M20970</link>
      <description>&lt;P&gt;Well, if you have extracted the fields 'bytes' and 'duration', I believe your stats command at the end of the line should read:&lt;/P&gt;

&lt;P&gt;...| stats c sum(bytes) sum(duration) by domain&lt;/P&gt;

&lt;P&gt;/k&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 15:31:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100262#M20970</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2012-03-26T15:31:52Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100263#M20971</link>
      <description>&lt;P&gt;Also, have you checked how your regex would handle subdomains/ports. I believe that it might fail to handle some cases.&lt;/P&gt;

&lt;P&gt;Not saying that the one I provided is perfect, but it will at least pick something out of it, since it does not expect a slash after three groups of characters.&lt;/P&gt;

&lt;P&gt;I don't really know what your format looks like, but there are a couple of possible patterns, where ABC is what you want to capture;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.ABC.com"&gt;http://www.ABC.com&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://ABC.com"&gt;http://ABC.com&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://www.ABC.co.uk"&gt;http://www.ABC.co.uk&lt;/A&gt;&lt;BR /&gt;
&lt;A href="https://ABC.co.uk"&gt;https://ABC.co.uk&lt;/A&gt;&lt;BR /&gt;
&lt;A href="ftp://ABC.com:21"&gt;ftp://ABC.com:21&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://all.work.and.no.play.ABC.com"&gt;http://all.work.and.no.play.ABC.com&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;..then you also might have trailing slashes....&lt;/P&gt;

&lt;P&gt;/k&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2012 15:51:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100263#M20971</guid>
      <dc:creator>kristian_kolb</dc:creator>
      <dc:date>2012-03-26T15:51:51Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100264#M20972</link>
      <description>&lt;P&gt;thanks for the reminder.&lt;BR /&gt;
i doubt if regex (in Splunk) can do if-then-else. otherwise, a single regex cannot handle URL with many levels of sub-domains or variations.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2012 04:13:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100264#M20972</guid>
      <dc:creator>supergtom</dc:creator>
      <dc:date>2012-03-27T04:13:56Z</dc:date>
    </item>
    <item>
      <title>Re: Bluecoat log with domain-based sorting possible?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100265#M20973</link>
      <description>&lt;P&gt;Had same problem - this worked for me...&lt;/P&gt;

&lt;P&gt;Created field extract named bcoat_proxysg: EXTRACT-cs_uri_authority with regex:&lt;/P&gt;

&lt;P&gt;(?)..*?.(?P&lt;CS_URI_AUTHORITY&gt;[a-z]+.[a-z]+(?=/)&lt;/CS_URI_AUTHORITY&gt;&lt;/P&gt;

&lt;P&gt;then changed the search/view.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 11:36:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bluecoat-log-with-domain-based-sorting-possible/m-p/100265#M20973</guid>
      <dc:creator>MikeyG</dc:creator>
      <dc:date>2020-09-28T11:36:30Z</dc:date>
    </item>
  </channel>
</rss>

