<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: access_combined hide certain useragents in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103734#M21829</link>
    <description>&lt;P&gt;I may, or may not understand your question...&lt;/P&gt;

&lt;P&gt;You are looking at a webserver access log and you want to report stats, but need to filter out bots.  You think that useragent string is the way to identify the bots from the people.  Unfortunately useragent strings are wild creatures and very hard to process consistently.  There is too much manual intervention required.&lt;/P&gt;

&lt;P&gt;Could you start by gathering everything that looks for a robots.txt ?  Perhaps get a list of IP's or useragents that requested robots.txt and then use those as your filter.   It won't get the virus probes and other black hat hits, but should align your stats closer to reality than counting up everything.&lt;/P&gt;</description>
    <pubDate>Sat, 18 Dec 2010 00:43:33 GMT</pubDate>
    <dc:creator>rotten</dc:creator>
    <dc:date>2010-12-18T00:43:33Z</dc:date>
    <item>
      <title>access_combined hide certain useragents</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103730#M21825</link>
      <description>&lt;P&gt;Hello
Im looking to do some stats on the traffic to my companys webserver (apache). Im using splunk as a lightforwarder. And monitoring it with the unix app. I want to hide the hits to the server from different kind of bots and fetchers from google and stuff like that. I have done it manualy with useragent!=. But there always seems to be new kund of useragents to block. How do you write to search for speciall words in the useragent (like bot, feed and spider)&lt;/P&gt;</description>
      <pubDate>Sat, 11 Dec 2010 19:35:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103730#M21825</guid>
      <dc:creator>fisk12</dc:creator>
      <dc:date>2010-12-11T19:35:03Z</dc:date>
    </item>
    <item>
      <title>Re: access_combined hide certain useragents</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103731#M21826</link>
      <description>&lt;P&gt;You could create an &lt;A href="http://www.splunk.com/base/Documentation/latest/AppManagement/Groupsimilareventsusingeventtypes" rel="nofollow"&gt;eventtype&lt;/A&gt; to group useragent values and filter against that eventtype in several searches.  Then you could maintain the one eventtype centrally.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;You could also achieve this by creating a &lt;A href="http://www.splunk.com/base/Documentation/latest/User/Fieldlookupstutorial" rel="nofollow"&gt;lookup&lt;/A&gt; for useragent&lt;BR /&gt;
where useragent.csv is:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;useragent, boolean_exclude
bot, true
feed, true
spider, true 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;and your search is:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | lookup useragent.csv useragent OUTPUT boolean_include | where isnull(boolean_exclude)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 11 Dec 2010 19:50:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103731#M21826</guid>
      <dc:creator>bwooden</dc:creator>
      <dc:date>2010-12-11T19:50:15Z</dc:date>
    </item>
    <item>
      <title>Re: access_combined hide certain useragents</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103732#M21827</link>
      <description>&lt;P&gt;Sound like a good idea. But i get some problem trying this. Im trying to create a txt file named useragent.csv and paste the thing you wrote. Then im doing, Lookup table files in manager-&amp;gt;lookups. Then i get this error. &lt;/P&gt;

&lt;P&gt;"Error in 'lookup' command: Could not find all of the specified destination fields in the lookup table." &lt;/P&gt;

&lt;P&gt;When doin this command index="os" source="/var/log/httpd/access_log" | lookup useragent.csv useragent OUTPUT boolean_include | where isnull(boolean_exclude) &lt;/P&gt;</description>
      <pubDate>Mon, 13 Dec 2010 18:37:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103732#M21827</guid>
      <dc:creator>fisk12</dc:creator>
      <dc:date>2010-12-13T18:37:38Z</dc:date>
    </item>
    <item>
      <title>Re: access_combined hide certain useragents</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103733#M21828</link>
      <description>&lt;P&gt;Anyone have any idea?&lt;/P&gt;</description>
      <pubDate>Fri, 17 Dec 2010 22:24:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103733#M21828</guid>
      <dc:creator>fisk12</dc:creator>
      <dc:date>2010-12-17T22:24:54Z</dc:date>
    </item>
    <item>
      <title>Re: access_combined hide certain useragents</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103734#M21829</link>
      <description>&lt;P&gt;I may, or may not understand your question...&lt;/P&gt;

&lt;P&gt;You are looking at a webserver access log and you want to report stats, but need to filter out bots.  You think that useragent string is the way to identify the bots from the people.  Unfortunately useragent strings are wild creatures and very hard to process consistently.  There is too much manual intervention required.&lt;/P&gt;

&lt;P&gt;Could you start by gathering everything that looks for a robots.txt ?  Perhaps get a list of IP's or useragents that requested robots.txt and then use those as your filter.   It won't get the virus probes and other black hat hits, but should align your stats closer to reality than counting up everything.&lt;/P&gt;</description>
      <pubDate>Sat, 18 Dec 2010 00:43:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103734#M21829</guid>
      <dc:creator>rotten</dc:creator>
      <dc:date>2010-12-18T00:43:33Z</dc:date>
    </item>
    <item>
      <title>Re: access_combined hide certain useragents</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103735#M21830</link>
      <description>&lt;P&gt;I created an eventtype called BOTS that will match bots that I know of (i.e. http_user_agent="&lt;EM&gt;crawler&lt;/EM&gt;" OR ...etc..). When I want to filter out events created by BOTS, I add it to my search query:&lt;/P&gt;

&lt;P&gt;something=something NOT eventtype=BOTS etc&lt;/P&gt;

&lt;P&gt;This works very well for me. Periodicly, I see a new bot show up &amp;amp; add it to my list.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2011 07:29:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/access-combined-hide-certain-useragents/m-p/103735#M21830</guid>
      <dc:creator>stjack99</dc:creator>
      <dc:date>2011-03-02T07:29:46Z</dc:date>
    </item>
  </channel>
</rss>

