<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Search IP in access_log for multiple user agents in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Search-IP-in-access-log-for-multiple-user-agents/m-p/23222#M177459</link>
    <description>&lt;P&gt;A &lt;EM&gt;much&lt;/EM&gt; better search would avoid the use of &lt;CODE&gt;transaction&lt;/CODE&gt; and instead do:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=access_combined earliest=-5m | stats distinct_count(user_agent) as ip_agent_count by clientip | where ip_agent_count &amp;gt;= 20
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Your first query is much better written as:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=access_combined earliest=5m | stats count by clientip | where count &amp;gt; 500
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In general, the &lt;CODE&gt;stats&lt;/CODE&gt; searches will scale about linearly with the number of indexers in your indexing cluster, while &lt;CODE&gt;transaction&lt;/CODE&gt; does not map-reduce as well and so will bottleneck on the search head.&lt;/P&gt;</description>
    <pubDate>Sun, 09 Jan 2011 08:02:09 GMT</pubDate>
    <dc:creator>gkanapathy</dc:creator>
    <dc:date>2011-01-09T08:02:09Z</dc:date>
    <item>
      <title>Search IP in access_log for multiple user agents</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Search-IP-in-access-log-for-multiple-user-agents/m-p/23221#M177458</link>
      <description>&lt;P&gt;I'm looking for spiders, which I can identify by abusive rates using transactions.  For example:
SPLUNK_SEARCH='sourcetype="access_combined"  startminutesago=5 | transaction fields=clientip maxspan=6m maxpause=1m | search linecount &amp;gt; 500'&lt;/P&gt;

&lt;P&gt;This will identify spiders or abusive traffic based on a business rule.  500 could be more or less.&lt;/P&gt;

&lt;P&gt;I would like a search with maybe linecount &amp;gt; 50 to find a list of IPs and then find out which IP has more than 20 or X different useragents.   This would help identify spiders that are trying to fly under the radar with a smaller transaction count and switching their useragent each hit to look more legit.&lt;/P&gt;</description>
      <pubDate>Sun, 09 Jan 2011 07:54:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Search-IP-in-access-log-for-multiple-user-agents/m-p/23221#M177458</guid>
      <dc:creator>slaterok</dc:creator>
      <dc:date>2011-01-09T07:54:42Z</dc:date>
    </item>
    <item>
      <title>Re: Search IP in access_log for multiple user agents</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Search-IP-in-access-log-for-multiple-user-agents/m-p/23222#M177459</link>
      <description>&lt;P&gt;A &lt;EM&gt;much&lt;/EM&gt; better search would avoid the use of &lt;CODE&gt;transaction&lt;/CODE&gt; and instead do:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=access_combined earliest=-5m | stats distinct_count(user_agent) as ip_agent_count by clientip | where ip_agent_count &amp;gt;= 20
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Your first query is much better written as:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=access_combined earliest=5m | stats count by clientip | where count &amp;gt; 500
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In general, the &lt;CODE&gt;stats&lt;/CODE&gt; searches will scale about linearly with the number of indexers in your indexing cluster, while &lt;CODE&gt;transaction&lt;/CODE&gt; does not map-reduce as well and so will bottleneck on the search head.&lt;/P&gt;</description>
      <pubDate>Sun, 09 Jan 2011 08:02:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Search-IP-in-access-log-for-multiple-user-agents/m-p/23222#M177459</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2011-01-09T08:02:09Z</dc:date>
    </item>
  </channel>
</rss>

