Splunk Search

Timechart: p99 requests/min by client

Explorer

I have a dataset of Nginx (a web server) request logs. Each entry contains a client_ip. I want to impose some rate limiting, but I want to see what my current traffic patterns are, so my rate limits don't impede the current regular traffic. There are two rate limit settings available, one expressed as a limit per second, and a limit per minute.

I would like to calculate the requests/second rate of each client_ip for each second. I would like to then aggregate (playing around with different aggregation functions, like avg, median, p90, p99, max, etc.) those values per-client_ip values into a timechart.

Put another way, I would like to make this timechart have one data point per minute, each of which shows the p99 request/seconds among all the client_ips for that minute. For example, that would give me a per-second rate limit that would make 99% pass, and block the top 1%.

I thought this would do it:

application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| timechart span=1s avg(count_per_sec)

But all of the count_per_sec values come out blank under the "statistics" tab.

0 Karma
1 Solution

Ultra Champion
application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| untable _time client_ip count_per_sec 
| stats avg(count_per_sec) as count_per_sec  by _time

the result of | timechart span=1s count as count_per_sec by client_ip is following:

_time X.X.X.X Y.Y.Y.Y Z.Z.Z.Z ...
aa:bb:00 1 2 3 ...
dd:ee:01 4 5 6 ..
...

count_per_sec field is nothing. | timechart span=1s avg(count_per_sec) can't work.

View solution in original post

Ultra Champion
application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| untable _time client_ip count_per_sec 
| stats avg(count_per_sec) as count_per_sec  by _time

the result of | timechart span=1s count as count_per_sec by client_ip is following:

_time X.X.X.X Y.Y.Y.Y Z.Z.Z.Z ...
aa:bb:00 1 2 3 ...
dd:ee:01 4 5 6 ..
...

count_per_sec field is nothing. | timechart span=1s avg(count_per_sec) can't work.

View solution in original post

Explorer

Thanks man, this worked wonderfully! The min/median/p99 values of this were heavily skewed by the IPs with 0 requests/min (which comprise most of the data points), so I fixed it by popping in a | where count_per_s != 0. This had a nice side effect of drastically reducing the memory use. Do you know of any others ways to decrease the memory usage of this? For time scales above a few hours i still get EOM errors (using like 30 GB, the limit for us is 3 GB lol).

0 Karma

Ultra Champion
 application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| bin _time span=1s 
| stats count as count_per_sec by _time client_ip
| stats avg(count_per_sec) as count_per_sec  by _time

Try this and check job inspector .

0 Karma

Explorer

Wow, that's a night and day difference! Whereas before I couldn't squeeze out more than a 30 minute window, this code let me go back over 7 days! I thought timeseries worked like bin and stats together, so I'm surprised there such a big difference. Is untable the culprit? I really know how to interpret the job inspection. The profiler chart shows startup.handoff eating up pretty much all of the time, and there are basically no other big "chunks"

0 Karma

Ultra Champion

The stats is simple aggregation and easy optimization.
Using fields is little. But timechart is search all period and untalbe wait till timechart end .
So, stats is faster.

0 Karma