I have a basic timechart query that graphs the number of Queries per second (QPS) for several hosts. I need to filter the results to only show hosts that had a change in QPS of + or - 50% at any point.
(Show only these two results and drop the others)
index=metrics host=*
| timechart span=5m partial=f limit=0 per_second(Query) as QPS by Site
Based on your search, I believe that index metrics is not a metrics index, but an event index. Is this correct? The fundamental idea to meet your idea is to not use timechart before we can detect change. Here is an alternative if you have to use index search.
index=metrics host=*
| rex field=host "^(?<host>[\w\d-]+)\."
| lookup dns.csv sd_hostname AS host
| bin _time span=1s
| stats rate(Query) as QPS by Site _time
| bin _time span=5m
| stats avg(QPS) as QPS by Site _time
| streamstats window=2 global=false current=true stdev(QPS) as devF by Site
| sort Site, - _time
| streamstats window=2 global=false current=true stdev(QPS) as devB by Site
| where 4*devF > QPS OR 4*devB > QPS
| timechart span=5m values(QPS) by Site
To get to this step, I have to run timebucket twice to get 5-min average of QPS. (When you run per_second in timechart span=5m, I suspect it gives you an average of sorts.)
I ran this against an emulation that you can also run,
index = _audit earliest=-0d@d latest=-0d@d+1h
| rename action as Site
| streamstats count as Query by Site
``` the above emulates
index=metrics host=*
| rex field=host "^(?<host>[\w\d-]+)\."
| lookup dns.csv sd_hostname AS host
```
The caveat is that if a Site has no query in some 5-minute intervals then have a significant change later, you won't get connected points on the graph.
Thanks for the idea. Unfortunately that's not going to work. I have to use a lookup table to get the Site and mstat insists on being the first command in the query.
index=metrics host=*
| rex field=host "^(?<host>[\w\d-]+)\."
| lookup dns.csv sd_hostname AS host
| timechart span=5m partial=f limit=0 per_second(Query) as QPS by Site
I also tried using mstat BY host but that did not return any results.
Based on your search, I believe that index metrics is not a metrics index, but an event index. Is this correct? The fundamental idea to meet your idea is to not use timechart before we can detect change. Here is an alternative if you have to use index search.
index=metrics host=*
| rex field=host "^(?<host>[\w\d-]+)\."
| lookup dns.csv sd_hostname AS host
| bin _time span=1s
| stats rate(Query) as QPS by Site _time
| bin _time span=5m
| stats avg(QPS) as QPS by Site _time
| streamstats window=2 global=false current=true stdev(QPS) as devF by Site
| sort Site, - _time
| streamstats window=2 global=false current=true stdev(QPS) as devB by Site
| where 4*devF > QPS OR 4*devB > QPS
| timechart span=5m values(QPS) by Site
To get to this step, I have to run timebucket twice to get 5-min average of QPS. (When you run per_second in timechart span=5m, I suspect it gives you an average of sorts.)
I ran this against an emulation that you can also run,
index = _audit earliest=-0d@d latest=-0d@d+1h
| rename action as Site
| streamstats count as Query by Site
``` the above emulates
index=metrics host=*
| rex field=host "^(?<host>[\w\d-]+)\."
| lookup dns.csv sd_hostname AS host
```
The caveat is that if a Site has no query in some 5-minute intervals then have a significant change later, you won't get connected points on the graph.
Hello Yuanliu,
Thanks so much for your suggestion. This is getting close. I did have to change the first "span=1s" to something greater than 1m in order to get any results. Most likely because the "Query" total (and other DNS stats) are only logged once every 5 minutes with the totals for the past five minutes.
As you mentioned this does not give the connection points in the graph so I had a thought; what if I use this query to generate a list of sites to use in my original query. Something like this:
index=metrics host=*
| rex field=host "^(?<host>[\w\d-]+)\."
| lookup dns.csv sd_hostname AS host
| search Site IN (*)
| bin _time span=5m
| stats values(Query) as QPS by Site _time
| bin _time span=5m
| stats avg(QPS) as QPS by Site _time
| streamstats window=2 global=false current=true stdev(QPS) as devF by Site
| sort Site, - _time
| streamstats window=2 global=false current=true stdev(QPS) as devB by Site
| where 4*devF > QPS OR devB*4 > QPS
| table Site | dedup Site | mvcombine Site delim=","
| nomv Site
This gives a CSV list of sites to search:
Site |
austx.1,snavtx.1 |
I am using Dashboard Studio and I'm trying to figure out how to chain these results as a variable in my original search
...
| search Site IN ($my-csv-list-from-above$)
...+
but so far I have not figured that out. Let me know if you have suggestions. Thanks again for your help!
Maybe something like
| mstats rate(Query) as QPS where index=metrics host=* by Site span=5m
| streamstats window=2 global=false current=false stdev(QPS) as devF by Site
| sort Site, - _time
| streamstats window=2 global=false current=false stdev(QPS) as devB by Site
| where 4*devF > QPS OR 4*devB > QPS
| timechart span=5m values(QPS) by Site