Splunk Search

Timechart: p99 requests/min by client

amomchilov
Explorer

I have a dataset of Nginx (a web server) request logs. Each entry contains a client_ip. I want to impose some rate limiting, but I want to see what my current traffic patterns are, so my rate limits don't impede the current regular traffic. There are two rate limit settings available, one expressed as a limit per second, and a limit per minute.

I would like to calculate the requests/second rate of each client_ip for each second. I would like to then aggregate (playing around with different aggregation functions, like avg, median, p90, p99, max, etc.) those values per-client_ip values into a timechart.

Put another way, I would like to make this timechart have one data point per minute, each of which shows the p99 request/seconds among all the client_ips for that minute. For example, that would give me a per-second rate limit that would make 99% pass, and block the top 1%.

I thought this would do it:

application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| timechart span=1s avg(count_per_sec)

But all of the count_per_sec values come out blank under the "statistics" tab.

0 Karma
1 Solution

to4kawa
Ultra Champion
application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| untable _time client_ip count_per_sec 
| stats avg(count_per_sec) as count_per_sec  by _time

the result of | timechart span=1s count as count_per_sec by client_ip is following:

_time X.X.X.X Y.Y.Y.Y Z.Z.Z.Z ...
aa:bb:00 1 2 3 ...
dd:ee:01 4 5 6 ..
...

count_per_sec field is nothing. | timechart span=1s avg(count_per_sec) can't work.

View solution in original post

to4kawa
Ultra Champion
application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| timechart span=1s count as count_per_sec by client_ip
| untable _time client_ip count_per_sec 
| stats avg(count_per_sec) as count_per_sec  by _time

the result of | timechart span=1s count as count_per_sec by client_ip is following:

_time X.X.X.X Y.Y.Y.Y Z.Z.Z.Z ...
aa:bb:00 1 2 3 ...
dd:ee:01 4 5 6 ..
...

count_per_sec field is nothing. | timechart span=1s avg(count_per_sec) can't work.

amomchilov
Explorer

Thanks man, this worked wonderfully! The min/median/p99 values of this were heavily skewed by the IPs with 0 requests/min (which comprise most of the data points), so I fixed it by popping in a | where count_per_s != 0. This had a nice side effect of drastically reducing the memory use. Do you know of any others ways to decrease the memory usage of this? For time scales above a few hours i still get EOM errors (using like 30 GB, the limit for us is 3 GB lol).

0 Karma

to4kawa
Ultra Champion
 application="my-app" index="my-index" request client_ip="*" user_agent="*" request="*" kube_pod="web-*"
| bin _time span=1s 
| stats count as count_per_sec by _time client_ip
| stats avg(count_per_sec) as count_per_sec  by _time

Try this and check job inspector .

0 Karma

amomchilov
Explorer

Wow, that's a night and day difference! Whereas before I couldn't squeeze out more than a 30 minute window, this code let me go back over 7 days! I thought timeseries worked like bin and stats together, so I'm surprised there such a big difference. Is untable the culprit? I really know how to interpret the job inspection. The profiler chart shows startup.handoff eating up pretty much all of the time, and there are basically no other big "chunks"

0 Karma

to4kawa
Ultra Champion

The stats is simple aggregation and easy optimization.
Using fields is little. But timechart is search all period and untalbe wait till timechart end .
So, stats is faster.

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...