<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What's the best search method to remove web crawlers or bots from download logs? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259254#M77714</link>
    <description>&lt;P&gt;I don't use eval statements to figure this if something is a bot.  I have a collection of 74 transforms applied against the useragent field.  The  regex patterns are highly tuned to match in the least amount of steps.  The reason I have so many is that our SEO team uses this data; however, this does not account for any bot doing a good job of impersonating a browser.  We also do a cidr match against the cip and assume any address coming from AWS, Google Cloud, Digital Ocean, and Azure address blocks are bots.&lt;/P&gt;

&lt;P&gt;Here is a link to a gist I created - &lt;A href="https://gist.github.com/httpstergeek/5fd08b9bc750e2d1954de78b063a092a"&gt;https://gist.github.com/httpstergeek/5fd08b9bc750e2d1954de78b063a092a&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Hope this helps and if it does dont forget to accept and vote up.  Cheers.&lt;/P&gt;</description>
    <pubDate>Wed, 25 Jan 2017 18:53:23 GMT</pubDate>
    <dc:creator>bmacias84</dc:creator>
    <dc:date>2017-01-25T18:53:23Z</dc:date>
    <item>
      <title>What's the best search method to remove web crawlers or bots from download logs?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259253#M77713</link>
      <description>&lt;P&gt;A few years ago, I was given a search string to filter web crawlers/bots from showing up in our download reports. I'm curious as to what other people use to make sure bots are not counted in their downloads...are there better methods? &lt;/P&gt;

&lt;P&gt;This is the string I inherited:     &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;eval agentType=if(match(http_user_agent,"(?i).*(bot|crawler|spider).*"),"Bot",if(match(http_user_agent,"^.*Mozilla/.*"),"Browser","Unknown")) | search agentType!="Bot"|search agentType!="Unknown"|
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Does anyone know of a more exact or better method to filter out crawlers?&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 22:59:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259253#M77713</guid>
      <dc:creator>mistydennis</dc:creator>
      <dc:date>2017-01-24T22:59:45Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best search method to remove web crawlers or bots from download logs?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259254#M77714</link>
      <description>&lt;P&gt;I don't use eval statements to figure this if something is a bot.  I have a collection of 74 transforms applied against the useragent field.  The  regex patterns are highly tuned to match in the least amount of steps.  The reason I have so many is that our SEO team uses this data; however, this does not account for any bot doing a good job of impersonating a browser.  We also do a cidr match against the cip and assume any address coming from AWS, Google Cloud, Digital Ocean, and Azure address blocks are bots.&lt;/P&gt;

&lt;P&gt;Here is a link to a gist I created - &lt;A href="https://gist.github.com/httpstergeek/5fd08b9bc750e2d1954de78b063a092a"&gt;https://gist.github.com/httpstergeek/5fd08b9bc750e2d1954de78b063a092a&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Hope this helps and if it does dont forget to accept and vote up.  Cheers.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 18:53:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259254#M77714</guid>
      <dc:creator>bmacias84</dc:creator>
      <dc:date>2017-01-25T18:53:23Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best search method to remove web crawlers or bots from download logs?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259255#M77715</link>
      <description>&lt;P&gt;Wow, so that's a totally different method &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;  Is it safe to say there isn't a definitive way to 100% accurately define bots?&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 19:11:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259255#M77715</guid>
      <dc:creator>mistydennis</dc:creator>
      <dc:date>2017-01-25T19:11:56Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best search method to remove web crawlers or bots from download logs?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259256#M77716</link>
      <description>&lt;P&gt;You can get fairly close, but definitely not 100%.  We also use Google Analytics and our number match up fairly closely.  Our SEO team uses Splunk for quick analysis and granularity since GA I think reports hourly.  &lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 19:54:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259256#M77716</guid>
      <dc:creator>bmacias84</dc:creator>
      <dc:date>2017-01-25T19:54:58Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best search method to remove web crawlers or bots from download logs?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259257#M77717</link>
      <description>&lt;P&gt;Are you able to share any hints on how you created your set of 74 transforms? I can't find anything anywhere on making sure what I'm using is giving accurate results. When I compare Splunk and GA, the numbers vary greatly and I'm trying to figure it out if it's my eval that's the problem or if GA is misbehaving somehow.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 20:02:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259257#M77717</guid>
      <dc:creator>mistydennis</dc:creator>
      <dc:date>2017-01-25T20:02:38Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best search method to remove web crawlers or bots from download logs?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259258#M77718</link>
      <description>&lt;P&gt;I've converted my post to an answer with a link to my transform as gist.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 20:14:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259258#M77718</guid>
      <dc:creator>bmacias84</dc:creator>
      <dc:date>2017-01-25T20:14:09Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best search method to remove web crawlers or bots from download logs?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259259#M77719</link>
      <description>&lt;P&gt;You are AMAZING! Thank you so much!  &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 20:24:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259259#M77719</guid>
      <dc:creator>mistydennis</dc:creator>
      <dc:date>2017-01-25T20:24:41Z</dc:date>
    </item>
    <item>
      <title>Re: What's the best search method to remove web crawlers or bots from download logs?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259260#M77720</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;

&lt;P&gt;Could you explain how to correctly implement this configuraton in Splunk, I've copied transforms.conf but nothing has changed&lt;BR /&gt;
I also want to exclude all bots from my analysis.&lt;/P&gt;

&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Mon, 20 Mar 2017 10:32:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/What-s-the-best-search-method-to-remove-web-crawlers-or-bots/m-p/259260#M77720</guid>
      <dc:creator>rwesolowski</dc:creator>
      <dc:date>2017-03-20T10:32:36Z</dc:date>
    </item>
  </channel>
</rss>

