I'm trying to optimize the alerts since I'm having issues. Where I work, it's somewhat slow to solve the problem (1 to 3 days) when the alert is triggered. This causes the alert to constantly trigger in the given time. I can't use Throttle since my alerts do not depend on a single host or event. For example:
index=os_pci_windowsatom host IN (HostP1 HostP2 HostP3 HostP4) source=cnt_mx_pci_sql_*_status_db
|dedup 1 host state_desc
| streamstats values(state_desc) as State by host
| eval Estado=case(
State!="ONLINE", "Critico",
State="ONLINE", "Safe"
)
| table Estado host State _time
| where Estado="Critico"
When the status of a Host changes to critical, it triggers the alert. For this reason, I cannot use Throttle because in the time span that this alert is silenced, one of the hosts may trigger, omitting the entire alert completely.
My idea is to create logic based on the results of the last triggered alert and compare them with the current alert where if the host and status are the same, it remains unchanged. However, if the host and status are different from the previous one triggered, it should be triggered. I thought about using the data where it's stored, but I don't know how to search for this information, does anyone have an idea? e
Any comment is greatly appreciated.
My solution was to configure another alert to send a lookup with a status of the first alert. I created a logic rule where if the first alert has a new result different from the second alert, this one would be triggered.
| eval Estado=case(
State="Offline", "Critico",
State="EnSplunk", "Safe")
| join type=left host [
| inputlookup lkp_mx_mr_pci_diponibles_results.csv
| eval host1=host
| eval Estado1=Estado
| table host host1 Estado1 Servicio
]
| eval Estado2=Estado
| eval host2=host
| eval case=if(host1=host2 AND Estado1=Estado2, "true", "false")
| table Estado host SO Servicio Fecha host1 host2 Estado1 Estado2 case
| sort Estado
| where Estado="Critico" AND case="false"
| fields - host1 host2 case Estado1 Estado2
My solution was to configure another alert to send a lookup with a status of the first alert. I created a logic rule where if the first alert has a new result different from the second alert, this one would be triggered.
| eval Estado=case(
State="Offline", "Critico",
State="EnSplunk", "Safe")
| join type=left host [
| inputlookup lkp_mx_mr_pci_diponibles_results.csv
| eval host1=host
| eval Estado1=Estado
| table host host1 Estado1 Servicio
]
| eval Estado2=Estado
| eval host2=host
| eval case=if(host1=host2 AND Estado1=Estado2, "true", "false")
| table Estado host SO Servicio Fecha host1 host2 Estado1 Estado2 case
| sort Estado
| where Estado="Critico" AND case="false"
| fields - host1 host2 case Estado1 Estado2
Let me first try to understand the problem: You want to find servers whose end state is offline, but whose immediate previous reported state is not offline, i.e., those whose state newly becomes offline. Is this correct? In other words, given these mock events
_time | host | state_desc |
2024-12-20 18:00 | host1 | not online |
2024-12-20 16:00 | host2 | not online |
2024-12-20 14:00 | host3 | ONLINE |
2024-12-20 12:00 | host4 | not online |
2024-12-20 10:00 | host0 | not online |
2024-12-20 08:00 | host1 | ONLINE |
2024-12-20 06:00 | host2 | not online |
2024-12-20 04:00 | host3 | not online |
2024-12-20 02:00 | host4 | ONLINE |
2024-12-20 00:00 | host0 | not online |
2024-12-19 22:00 | host1 | not online |
2024-12-19 20:00 | host2 | ONLINE |
2024-12-19 18:00 | host3 | not online |
2024-12-19 16:00 | host4 | not online |
2024-12-19 14:00 | host0 | ONLINE |
2024-12-19 12:00 | host1 | not online |
2024-12-19 10:00 | host2 | not online |
2024-12-19 08:00 | host3 | ONLINE |
2024-12-19 06:00 | host4 | not online |
2024-12-19 04:00 | host0 | not online |
2024-12-19 02:00 | host1 | ONLINE |
2024-12-19 00:00 | host2 | not online |
2024-12-18 22:00 | host3 | not online |
2024-12-18 20:00 | host4 | ONLINE |
2024-12-18 18:00 | host0 | not online |
2024-12-18 16:00 | host1 | not online |
2024-12-18 14:00 | host2 | ONLINE |
2024-12-18 12:00 | host3 | not online |
2024-12-18 10:00 | host4 | not online |
2024-12-18 08:00 | host0 | ONLINE |
2024-12-18 06:00 | host1 | not online |
2024-12-18 04:00 | host2 | not online |
2024-12-18 02:00 | host3 | ONLINE |
2024-12-18 00:00 | host4 | not online |
2024-12-17 22:00 | host0 | not online |
You want alert on host1 and host4 only.
To do this with streamstats, you will need to sort events this way and that. I usually consider them costs. (And I am quite fuzzy in streamstats:-) So, I consider this one of few good uses of transaction. Something like
index=os_pci_windowsatom host IN (HostP1 HostP2 HostP3 HostP4) source=cnt_mx_pci_sql_*_status_db
| transaction host endswith=state_desc=ONLINE keepevicted=true
| search eventcount = 1 state_desc != ONLINE
Here is an emulation of the mock data for you to play with and compare with real data.
| makeresults count=35
| streamstats count as state_desc
| eval _time = relative_time(_time - state_desc * 7200, "-0h@h")
| eval host = "host" . state_desc % 5, state_desc = if(state_desc % 3 > 0, "not online", "ONLINE")
``` the above emulates
index=os_pci_windowsatom host IN (HostP1 HostP2 HostP3 HostP4) source=cnt_mx_pci_sql_*_status_db
```
Output from the search is
_time | closed_txn | duration | eventcount | field_match_sum | host | linecount | state_desc |
2024-12-20 18:00 | 0 | 0 | 1 | 1 | host1 | 1 | not online |
2024-12-20 12:00 | 0 | 0 | 1 | 1 | host4 | 1 | not online |
The rest of your search is simply manipulation of display string.