Hi,
I am trying to write a query that would get me the average TPS and average response time for services in the same table. I tried this -
<search>
| eval <evaluate response time as RT>
| bin _time AS "TIME" span=1s
| eventstats count as TPS by TIME, service
| stats count AS txnCount, avg(TPS) as avgTPS, avg(RT) as avgRT by service
However, the numbers don't seem to match when I am running the TPS query individually like this -
<search>
| bin _time AS "TIME" span=1s
| eventstats count as TPS by TIME, service
| stats count AS txnCount, avg(TPS) as avgTPS by service
Any suggestions what I could be doing wrong here?
Thank you!
The problem with using eventstats to calculate something then stats to do something else on that value is that eventstats will put the calculated value in every event found, so consider this data
12:00:00.001 query=a, rt=100
12:00:00.002 query=a, rt=100
12:00:00.003 query=a, rt=100
12:00:01.001 query=a, rt=100
12:00:01.002 query=a, rt=100
12:00:02.001 query=a, rt=100
After eventstats you will have this data
12:00:00.001 query=a, rt=100 TPS=3
12:00:00.002 query=a, rt=100 TPS=3
12:00:00.003 query=a, rt=100 TPS=3
12:00:01.001 query=a, rt=100 TPS=2
12:00:01.002 query=a, rt=100 TPS=2
12:00:02.001 query=a, rt=100 TPS=1
so then when you calculate avg(TPS) on this data you have
3+3+3+2+2+1=14 / 6 events = 2.3333333
however, the average of the TPS is actually 3+2+1=6 / 3 seconds = 2
which is why you have different values and the eventstats+stats version is wrong.
The challenge you have is in getting both percentiles and TPS. You can only use ONE of the TPS values per second when calculating averages and percentiles and you cannot use the double stats method if you want percentiles.
This is a technique that can be used to calculate the numbers using eventstats. This uses streamstats to count the events per second and then sets all other TPS values to null apart from the first one per second, which then means you can use the avg(TPS) and percentiles as the events that have null TPS are not counted, so in the above data example, you get the correct average TPS value of 2.
| eventstats count as TPS by TIME query
| streamstats time_window=1s c as tmp
| eval TPS=if(tmp>1, null(), TPS)
| stats avg(TPS) as avgTPS perc90(TPS) as p90TPS by query
There may be other ways to do this
@ITWhisperer do you know of a way to calc TPS then percentiles+averages of that TPS without using a subsearch to duplicate the search?
Hello @bowesmana & @ITWhisperer, many thanks for the solution, and for the explanation provided along with! 🙂 I had to take a refresher for evenstats and streamstats, and it all fit well with the explanations by @bowesmana.
Finally went with the modified query suggested by @ITWhisperer, and the results look good, and as expected.
Thank you once again, both!
Also, note that using eventstats will be much slower than using stats twice. As you are doing average, it's each to calculate the total RT and then create the average RT at the end like this.
<search>
| bin _time as "TIME" span=1s
| stats count as TPS sum(RT) as RT by TIME service
| stats sum(TPS) as txnCount, avg(TPS) as avgTPS, sum(RT) as RT by service
| eval avgRT_sum=RT/txnCount
Thanks @bowesmana. Learnt something new about evenstats today. 🙂
I do need the 90th percentile TPS and RT as well. Also, the TPS does not match with what I was getting with eventstats.
<search>
| bin _time AS "TIME" span=1s
| stats count as TPS sum(RT) as RT by TIME query
| stats sum(TPS) as txnCount, avg(TPS) as avgTPS, sum(RT) as RT by query
| eval avgRT_sum=round(RT/txnCount,2)
| eval avgTPS = round (avgTPS,2)
This is the query that I have -
<search> | bin _time AS "TIME" span=1s | eventstats count as TPS by TIME, query | stats count AS txnCount, perc90(TPS) as P90_TPS, perc90(RT) as P90_RT, avg(TPS) as avgTPS, avg(RT) as avgRT by query | eval avgTPS = round (avgTPS,2) | eval avgRT = round (avgRT,2)
Thank you!
The problem with using eventstats to calculate something then stats to do something else on that value is that eventstats will put the calculated value in every event found, so consider this data
12:00:00.001 query=a, rt=100
12:00:00.002 query=a, rt=100
12:00:00.003 query=a, rt=100
12:00:01.001 query=a, rt=100
12:00:01.002 query=a, rt=100
12:00:02.001 query=a, rt=100
After eventstats you will have this data
12:00:00.001 query=a, rt=100 TPS=3
12:00:00.002 query=a, rt=100 TPS=3
12:00:00.003 query=a, rt=100 TPS=3
12:00:01.001 query=a, rt=100 TPS=2
12:00:01.002 query=a, rt=100 TPS=2
12:00:02.001 query=a, rt=100 TPS=1
so then when you calculate avg(TPS) on this data you have
3+3+3+2+2+1=14 / 6 events = 2.3333333
however, the average of the TPS is actually 3+2+1=6 / 3 seconds = 2
which is why you have different values and the eventstats+stats version is wrong.
The challenge you have is in getting both percentiles and TPS. You can only use ONE of the TPS values per second when calculating averages and percentiles and you cannot use the double stats method if you want percentiles.
This is a technique that can be used to calculate the numbers using eventstats. This uses streamstats to count the events per second and then sets all other TPS values to null apart from the first one per second, which then means you can use the avg(TPS) and percentiles as the events that have null TPS are not counted, so in the above data example, you get the correct average TPS value of 2.
| eventstats count as TPS by TIME query
| streamstats time_window=1s c as tmp
| eval TPS=if(tmp>1, null(), TPS)
| stats avg(TPS) as avgTPS perc90(TPS) as p90TPS by query
There may be other ways to do this
@ITWhisperer do you know of a way to calc TPS then percentiles+averages of that TPS without using a subsearch to duplicate the search?
@bowesmana Your approach is almost how I would have done it too. A slight modification is to the streamstats which should take into account the query as well. Here is a runanywhere example (based on your approach)
| makeresults count=1000
| eval _time=_time+(random()%1000)/100
| eval query=mvindex(split("ABC",""),random()%3)
| eval RT=random()%100/10
``` the lines above set up some dummy data ```
| bin _time as TIME span=1s
| eventstats count as TPS by TIME query
| streamstats c as tmp global=f by TIME query
| eval TPS=if(tmp>1, null(), TPS)
| stats avg(TPS) as avgTPS perc90(TPS) as p90TPS by query
Thanks for that pick @ITWhisperer - I copied the wrong streamstats example.
Make sure you are comparing the same time intervals when you check your numbers - if you latest time is 'now' then you will always get different results every time you run the search
I think that's the mistake I was doing when comparing the two. Although I don't recall now 🙂. However, now the initial query seems to be giving me the results I expect.
Final working query -
<search>
| eval RT = endTime - startTime
| bin _time AS "TIME" span=1s
| eventstats count as TPS by TIME, query
| stats count AS txnCount, perc90(TPS) as P90_TPS, perc90(TotalTime) as P90_RT, avg(TPS) as avgTPS, avg(TotalTime) as avgRT by query
| eval avgTPS = round (avgTPS,2)
| eval avgRT = round (avgRT,2)
Before trying to find what is "wrong", maybe first determine if anything is actually wrong. What is the method used to determine that the numbers do not match? I ask because the codes suggest very dense events, so any observed difference could simply be an artifact due to data variants.
Another possible path to examine could be any side effect from "<evaluate response time as RT>" that is not illustrated.
Thank you for responding @yuanliu. I just did a comparison w/ and w/o the RT evaluation - basically comparing the results of the 2 queries that is mentioned in the original post.
As for this -
"<evaluate response time as RT>"
It was a simple evaluation
RT=endTime-startTime
However, it looks like the initial queries that I posted are working as expected. Probably I was using a wrong time window for running both the queries. Final working query -
<search>
| eval RT = endTime - startTime
| bin _time AS "TIME" span=1s
| eventstats count as TPS by TIME, query
| stats count AS txnCount, perc90(TPS) as P90_TPS, perc90(TotalTime) as P90_RT, avg(TPS) as avgTPS, avg(TotalTime) as avgRT by query
| eval avgTPS = round (avgTPS,2)
| eval avgRT = round (avgRT,2)