Hi everyone
We have an on-premise edge device in the remote location, and it is added to the cloud. I would like to monitor and set an alert for both device offline and recovery statuses.
While I can set an alert for the offline status, I'm a bit confused about including the recovery status. Can you please assist me in configuring the alert for both scenarios?
Hi @parthiban.
if you have as results of your search : onlineStatus="online" and/or onlineStatus=offline, you could modify your search in this way:
index= "XXXXX" "Genesys system is available"
| spath input=_raw output=new_field path=response_details.response_payload.entities{}
| mvexpand new_field
| fields new_field
| spath input=new_field output=serialNumber path=serialNumber
| spath input=new_field output=onlineStatus path=onlineStatus
| where serialNumber!=""
| lookup Genesys_Monitoring.csv serialNumber
| where Country="Bangladesh"
| stats
count(eval(onlineStatus="offline")) AS offline_count
count(eval(onlineStatus="online")) AS online_count
earliest(eval(if(onlineStatus="offline",_time,""))) AS offline_time
earliest(eval(if(onlineStatus="online",_time,""))) AS online_time
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online",
offline_count>0 AND online_count>0 AND offline>online, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
Ciao.
Giuseppe
Hi @gcusello
In the log, we receive the payload model below. In the 'entities' section, I've only specified one device status, but in reality, there are 11 device statuses in a single log message. I want to create an alert: if a device goes offline, it will trigger one alert, and when it comes online, it will trigger a clear alarm alert. I specify having only one alert because we receive logs every 2 minutes from AWS, and to avoid multiple alerts for the same device going offline and online.. Hope it is clear what my requirement is.
response_details:
▼{
response_payload:▼
{
entities:
▼{
id:"YYYYYYY",
name:"ABC",
onlineStatus:"ONLINE",
serialNumber:"XXXXXXX",
},
Hi @parthiban,
please confirm: you want an alert if onlineStatus="recovery" or if, for a defined period, you don't receive logs from a device is is correct?
In this case, you can use my second search creating a list of devices to monitor in a lookup.
Ciao.
Giuseppe
Hi @gcusello
Yes want alert for online status="OFFLINE" and online status="Online" for the same device
Hi @parthiban,
ok, but how can the device send a status if it's offline?
if it continue to send logs even if it's offline, you can add this condition to the search, but, as I suppose, it doesnt sends logs when offline, you can use my search.
Ciao.
Giuseppe
Hi @gcusello
This is on premises device and managed by cloud. If device went offline cloud will send log.
Which condition I need to add ?
Hi @parthiban,
status = "OFFLINE"
please try this:
index=your_index
| stats count BY device status
| append [ | inputlookup perimeter.csv | eval count=0 | fields device count ]
| stats sum(count) AS total BY device status
| eval status=if(total=0,"down",status)
| search status="recovery" OR status="offline" OR status="down"
| table device status
Ciao.
Giuseppe
Hi @gcusello
| rename "response_details.response_payload.entities{}.onlineStatus" as status
| stats count BY status
| append [ | makeresults | eval name=xxxx, count=0 | fields name ]
| stats sum(count) AS total BY status
| eval status=if(total=0,"OFFLINE",status)
| search status="ONLINE" OR status="OFFLINE"
| table status
I getting result is "ONLINE"
How it will works on the alert ? How can I set in the alert? Can you please guide me
Hi @parthiban,
probably there's a misundertanding one the condition to check:
I understood that you want to check if status="recovery" or status=down, and I check for these statuses, but what's your requirement?
with your search you check status=down and status=online, is this the requirement?
Ciao.
Giuseppe
Hi @gcusello
Let me clarify,
We receive device status logs every 2 minutes from AWS Cloud. These logs indicate both online and offline statuses. If a device goes offline, we continuously receive offline logs until it comes back online, at which point we receive online logs for that specific device.
My requirement is to trigger a critical alert for the end user when a particular device goes offline. Subsequently, I will notify the end user when the device comes back online. Based I need to create alert. Is this possible? also I have already shared example logs in this conversation.
Moreover we have this type of alert is working other observability application, now we are migrating to Splunk.
I hope this clarifies my requirement. Please let me know anything required.
Hi @parthiban ,
it isn't a problem notification when status is offline but, after the first offline, do you want that the alert continues to fire "offline", or do you want a message when it comes back on line?
if you want a message every time you have offline and the following online, you could try something like this:
<your_search>
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online"),
offline_count>0 AND online_count>0 AND online>offline, "Offline"),
offline_count=0 AND online_count=0, "No data")
| table condition
in this way you can choose the conditions to trigger the alert.
Ciao.
Giuseppe
Hi @gcusello
No, don't want cont alert for offline... I want to trigger first offline and first online message. Thanks for understanding.
Hi @parthiban ,
you have only to setup the conditions for the alert:
<your_search>
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online"),
offline_count>0 AND online_count>0 AND online>offline, "Offline"),
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
in this way your alert will trigger the two conditions.
Ciao.
Giuseppe
Hi @gcusello
I tried which you given code, it is not working throwing some error.
"Error in 'EvalCommand': Type checking failed. 'AND' only takes boolean arguments"
index="XXXX"
| rename "response_details.response_payload.entities{}" as status
| where name="YYYY"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online"),
offline_count>0 AND online_count>0 AND online>offline, "Offline"),
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
Hi, sorry, please try this:
index="XXXX"
| rename "response_details.response_payload.entities{}" as status
| where name="YYYY"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly
online",
offline_count>0 AND online_count>0 AND online>offline, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
Ciao.
Giuseppe
HI @gcusello
This time its runs without error, but no result found.
index="XXXX" "Genesys system is available"
| rename "response_details.response_payload.entities{}.onlineStatus" as status
| where name="YYYY"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly
online",
offline_count>0 AND online_count>0 AND online>offline, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
Hi @parthiban,
I found an error in the eval definition, but it shouldn't be the issue:
index="XXXX" "Genesys system is available"
| rename "response_details.response_payload.entities{}.onlineStatus" as status
| where name="YYYY"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly
online",
offline_count>0 AND online_count>0 AND offline>online, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
Debug the search, to understand if the search conditions are verified or not: remove the search statement and see which values you have.
Ciao.
Giuseppe
Hi @gcusello
If I remove the below search condition I get this result.
| search condition="Offline" OR condition="Offline but newly online" | table condition
Hi @parthiban ,
use the correct field for "status" and check if the conditions in the stats command are the correct ones.
Ciao.
Giuseppe
Hi @gcusello
I am using correct field only which is below mentioned one.
| rename "response_details.response_payload.entities{}.onlineStatus" as status
Hi @parthiban,
if you run:
index="XXXX" "Genesys system is available"
| rename "response_details.response_payload.entities{}.onlineStatus" as status
| where name="YYYY"
which values have you for the status field?
Ciao.
Giuseppe