I have a dataset of Nginx (a web server) request logs. Each entry contains a client_ip
. I want to impose some rate limiting, but I want to see what my current traffic patterns are, so my rate limits don't impede the current regular traffic. There are two rate limit settings available, one expressed as a limit per second, and a limit per minute.
I would like to calculate the requests/second rate of each client_ip
for each second. I would like to then aggregate (playing around with different aggregation functions, like avg, median, p90, p99, max, etc.) those values per-client_ip values into a timechart
.
Put another way, I would like to make this timechart
have one data point per minute, each of which shows the p99 request/seconds among all the client_ips for that minute. For example, that would give me a per-second rate limit that would make 99% pass, and block the top 1%.
I thought this would do it:
application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| timechart span=1s avg(count_per_sec)
But all of the count_per_sec
values come out blank under the "statistics" tab.
application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| untable _time client_ip count_per_sec
| stats avg(count_per_sec) as count_per_sec by _time
the result of | timechart span=1s count as count_per_sec by client_ip
is following:
_time X.X.X.X Y.Y.Y.Y Z.Z.Z.Z ...
aa:bb:00 1 2 3 ...
dd:ee:01 4 5 6 ..
...
count_per_sec field is nothing. | timechart span=1s avg(count_per_sec)
can't work.
application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| untable _time client_ip count_per_sec
| stats avg(count_per_sec) as count_per_sec by _time
the result of | timechart span=1s count as count_per_sec by client_ip
is following:
_time X.X.X.X Y.Y.Y.Y Z.Z.Z.Z ...
aa:bb:00 1 2 3 ...
dd:ee:01 4 5 6 ..
...
count_per_sec field is nothing. | timechart span=1s avg(count_per_sec)
can't work.
Thanks man, this worked wonderfully! The min/median/p99 values of this were heavily skewed by the IPs with 0 requests/min (which comprise most of the data points), so I fixed it by popping in a | where count_per_s != 0
. This had a nice side effect of drastically reducing the memory use. Do you know of any others ways to decrease the memory usage of this? For time scales above a few hours i still get EOM errors (using like 30 GB, the limit for us is 3 GB lol).
application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| bin _time span=1s
| stats count as count_per_sec by _time client_ip
| stats avg(count_per_sec) as count_per_sec by _time
Try this and check job inspector .
Wow, that's a night and day difference! Whereas before I couldn't squeeze out more than a 30 minute window, this code let me go back over 7 days! I thought timeseries
worked like bin
and stats
together, so I'm surprised there such a big difference. Is untable
the culprit? I really know how to interpret the job inspection. The profiler chart shows startup.handoff
eating up pretty much all of the time, and there are basically no other big "chunks"
The stats
is simple aggregation and easy optimization.
Using fields is little. But timechart
is search all period and untalbe
wait till timechart
end .
So, stats
is faster.