Search IP in access_log for multiple user agents

slaterok — Sun, 09 Jan 2011 07:54:42 GMT

I'm looking for spiders, which I can identify by abusive rates using transactions. For example: SPLUNK_SEARCH='sourcetype="access_combined" startminutesago=5 | transaction fields=clientip maxspan=6m maxpause=1m | search linecount > 500'

This will identify spiders or abusive traffic based on a business rule. 500 could be more or less.

I would like a search with maybe linecount > 50 to find a list of IPs and then find out which IP has more than 20 or X different useragents. This would help identify spiders that are trying to fly under the radar with a smaller transaction count and switching their useragent each hit to look more legit.

Re: Search IP in access_log for multiple user agents

gkanapathy — Sun, 09 Jan 2011 08:02:09 GMT

A much better search would avoid the use of transaction and instead do:

sourcetype=access_combined earliest=-5m | stats distinct_count(user_agent) as ip_agent_count by clientip | where ip_agent_count >= 20

Your first query is much better written as:

sourcetype=access_combined earliest=5m | stats count by clientip | where count > 500

In general, the stats searches will scale about linearly with the number of indexers in your indexing cluster, while transaction does not map-reduce as well and so will bottleneck on the search head.

topic Re: Search IP in access_log for multiple user agents in Splunk Search

Search IP in access_log for multiple user agents

Re: Search IP in access_log for multiple user agents