Splunk Search

Timechart Count by with Where

wbordeau
Explorer

I'm trying to narrow down a list of spiders whose traffic is inundating our network. So far, I've gathered that the number of hits greater than 35000 per 24 hour period is a strong indicator the source connection is a spider. On top of this, I want to display a timechart graphing the connections over time.

Essentially, I want to graph excessive clientips that meet a minimum threshold of 35000 hits per day but I want to graph it over a larger time period, like 7 days or even a month.

The following search isn't working. I've tried sub searches, I've tried limit, and top but am stuck.

earliest=-7d@d latest=now sourcetype="squid" | bucket _time span=1d | stats count by clientip | where count > 35000 | timechart span=30m count by clientip

This search works but takes an insane amount of time. I think if I can just filter out anything with hits fewer than 35,000 per day, the search would run a little faster.

earliest=-7d@d latest=now sourcetype="squid" format="AN_SQUID_VIP_HOST_LOG" | timechart span=30m limit=10 useother=f count by clientip

Tags (1)
0 Karma

tedwroks
Explorer

You will need to summary index for:

... | bucket _time bin=1h | stats count as reqs_per_ip by clientip, _time

That should produce the count of reqs per ip per hour. It would then be the basis of another query that uses a timechart that sums those reqs with a span of 24h, and uses a where clause to filter the series output to only include the > 35k

... | timechart span=24h sum(reqs_per_ip) as reqs_per_ip_last24h by clientip where max > 35000

MuS
SplunkTrust
SplunkTrust

Hi wbordeau

like /k said, your search is slow because of the data amount and because of the timechart after a where after a stats.
Why don't you simplify your search to something like:

earliest=-7d@d latest=now sourcetype="squid" | bucket _time span=1d | stats count by clientip _time | where count > 35000

this should bring back results immediately and you can still use reporting graphs on the result. If the result is useful, set it up as saved search with summary indexing enabled. This will speed up your future searches, if you use the summary index in your future searches.

hope this helps....

cheers, MuS

wbordeau
Explorer

Part of the trouble I'm having is for one, I don't know that I've got the search query order set correctly. Ultimately, I want to end up with a timechart that plots usage over time for any source IPs that generate over 35000 hits within a 24 hour period. I think the bucket argument will let me do that but I heard there are only rare cases when you should invoke that with a timechart command.

0 Karma

kristian_kolb
Ultra Champion

By definition this will take a lot of time, since you'll have to retrieve all events from more than a weeks time.

Perhaps the user-agent field (if that is being logged) can be used to find spiders.

Also, search acceleration, or summary indexing, could prove useful here.

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!