We currently have a search that shows a timeline graph of daily SVC usage by index. 10 of these indexes are our highest for SVC usage. I would like to create an alert if the SVC usage for any of those indexes goes 25% higher or lower than the normal amount.
Example: index=test normally uses 100 to 140 SVC per day.
the alert will tell us when that index goes 25% over 140 or under 100.
We want the search to do this for at least our top 10 SVC usage indexes.
Our current timechart search is as follows:
index=svc_summary
| timechart limit=10 span=1d useother=f sum(svc_usage) by Indexes
You could do something like this to look across the avg svc usage for the last 30 days per index (you need to update the first few lines with your svc search) and then determine the avg svc and then filter if 25% above/below the avg:
| tstats count where index=_internal earliest=-30d@d latest=now by _time, host span=1d
| rename host AS index
| stats sum(count) as svc by index _time
``` 1. Build a baseline for every index - Replace these lines with your original SVC search```
```2. 30‑day avg per index```
| eventstats avg(svc) as avg_svc by index
```3. Keep only the last day (the day you are currently monitoring)```
| where _time >= relative_time(now(), "-1d")
```4. Thresholds – 25% above or below the 30‑day average```
| eval si_high = avg_svc * 1.25
| eval si_low = avg_svc * 0.75
```5. Find any day that is outside the band```
| where svc > si_high OR svc < si_low
```6. Show the top 10 indexes by daily usage (optional)```
| sort 0 -svc
| head 10
| table _time index svc avg_svc si_high si_low
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
I tried to use what you provided with my data. I think it can work, but I am using a summary index and not the _internal index. Inside of that summary index, the actual indexes are named "Indexes" . I posted below my attempt to gel your search and my stuff together. Maybe you can help me now that you know this info
| tstats count where index=dg_app_summary NOT (Indexes="All*" OR Indexes="Undefined" OR Indexes="_*") earliest=-30d@d latest=now by _time, Indexes span=1d
| stats sum(count) as svc_usage by Indexes _time
``` 1. Build a baseline for every index - Replace these lines with your original SVC search```
| where Indexes=proxy OR Indexes=aws
```2. 30‑day avg per index```
| eventstats avg(svc_usage) as avg_svc by Indexes
```3. Keep only the last day (the day you are currently monitoring)```
| where _time >= relative_time(now(), "-1d")
```4. Thresholds – 25% above or below the 30‑day average```
| eval si_high = avg_svc * 1.25
| eval si_low = avg_svc * 0.75
```5. Find any day that is outside the band```
| where svc_usage > si_high OR svc_usage < si_low
```6. Show the top 10 indexes by daily usage (optional)```
| sort 0 -svc_usage
| head 10
| table _time Indexes svc_usage avg_svc si_high si_low
Instead of just illustrating all SPLs, the best way to obtain concrete help is to first illustrate data - in your case, sample output from svc_summary - even data mockups to illustrate your observed variations, then illustrate the results you want from illustrated data, then explain the logic between data and desired results. Like @livehybrid said, summary index or _internal is not the problem. The problem is that volunteers here have difficulty understand your business logic.
My _internal search was really just a proof-of-concept. Just looking through your search, why are you using the "| where Indexes=proxy OR Indexes=aws"? It would be better to include this in the tstats where statement.
If you do everything down to, and including '| where _time >= relative_time(now(), "-1d")' do you get results?
If you get results here then it would suggest its working and then being filtered out by the further "where" command which limits based on the thresholds.
If you're getting no results then could it be because the thresholds arent currently breached?
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
If this is supposed to be a statically thresholded alert, you can just add
| where (your_condition_maching_excessive_usage)
And alert if you get any results.
If you would like to have some form of dynamic thresholding based on previous values... That might need some more funky logic and possibly include MLTK
It's a daily alert: Some days like Saturday or Sunday might not lower than Monday and Tuesday.
But let's say last Monday the highest SVC was 140. But this Monday it was 200. I want to know that happened.
it can be percentage or statistical.
I tried to use the MTLK command but I kept getting an error.