I'm trying to narrow down a list of spiders whose traffic is inundating our network. So far, I've gathered that the number of hits greater than 35000 per 24 hour period is a strong indicator the source connection is a spider. On top of this, I want to display a timechart graphing the connections over time.
Essentially, I want to graph excessive clientips that meet a minimum threshold of 35000 hits per day but I want to graph it over a larger time period, like 7 days or even a month.
The following search isn't working. I've tried sub searches, I've tried limit, and top but am stuck.
earliest=-7d@d latest=now sourcetype="squid" | bucket _time span=1d | stats count by clientip | where count > 35000 | timechart span=30m count by clientip
This search works but takes an insane amount of time. I think if I can just filter out anything with hits fewer than 35,000 per day, the search would run a little faster.
... | bucket _time bin=1h | stats count as reqs_per_ip by clientip, _time
That should produce the count of reqs per ip per hour. It would then be the basis of another query that uses a timechart that sums those reqs with a span of 24h, and uses a where clause to filter the series output to only include the > 35k
... | timechart span=24h sum(reqs_per_ip) as reqs_per_ip_last24h by clientip where max > 35000
like /k said, your search is slow because of the data amount and because of the timechart after a where after a stats.
Why don't you simplify your search to something like:
earliest=-7d@d latest=now sourcetype="squid" | bucket _time span=1d | stats count by clientip _time | where count > 35000
this should bring back results immediately and you can still use reporting graphs on the result. If the result is useful, set it up as saved search with summary indexing enabled. This will speed up your future searches, if you use the summary index in your future searches.
Part of the trouble I'm having is for one, I don't know that I've got the search query order set correctly. Ultimately, I want to end up with a timechart that plots usage over time for any source IPs that generate over 35000 hits within a 24 hour period. I think the bucket argument will let me do that but I heard there are only rare cases when you should invoke that with a timechart command.