Alerting

Creating alert for if device went offline and recovery status.

parthiban
Path Finder

Hi everyone

We have an on-premise edge device in the remote location, and it is added to the cloud. I would like to monitor and set an alert for both device offline and recovery statuses.

While I can set an alert for the offline status, I'm a bit confused about including the recovery status. Can you please assist me in configuring the alert for both scenarios?

Labels (4)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban.

if you have as results of your search : onlineStatus="online" and/or onlineStatus=offline, you could modify your search in this way:

index= "XXXXX" "Genesys system is available"
| spath input=_raw output=new_field path=response_details.response_payload.entities{}
| mvexpand new_field
| fields new_field
| spath input=new_field output=serialNumber path=serialNumber
| spath input=new_field output=onlineStatus path=onlineStatus
| where serialNumber!=""
| lookup Genesys_Monitoring.csv serialNumber
| where Country="Bangladesh"
| stats 
   count(eval(onlineStatus="offline")) AS offline_count
   count(eval(onlineStatus="online")) AS online_count
   earliest(eval(if(onlineStatus="offline",_time,""))) AS offline_time
   earliest(eval(if(onlineStatus="online",_time,""))) AS online_time
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
   offline_count=0 AND online_count>0,"Online",
   offline_count>0 AND online_count=0,"Offline",
   offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online",
   offline_count>0 AND online_count>0 AND offline>online, "Offline",
   offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition

Ciao.

Giuseppe

 

 

View solution in original post

0 Karma

parthiban
Path Finder

Hi @gcusello 

In the log, we receive the payload model below. In the 'entities' section, I've only specified one device status, but in reality, there are 11 device statuses in a single log message. I want to create an alert: if a device goes offline, it will trigger one alert, and when it comes online, it will trigger a clear alarm alert. I specify having only one alert because we receive logs every 2 minutes from AWS, and to avoid multiple alerts for the same device going offline and online.Hope it is clear what my requirement is.

response_details:
▼{

response_payload:▼
{
entities:

▼{
id:"YYYYYYY",
name:"ABC",
onlineStatus:"ONLINE",
serialNumber:"XXXXXXX",

},


0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban,

please confirm: you want an alert if onlineStatus="recovery" or if, for a defined period, you don't receive logs from a device is is correct?

In this case, you can use my second search creating a list of devices to monitor in a lookup.

Ciao.

Giuseppe

0 Karma

parthiban
Path Finder

Hi @gcusello  

Yes want alert for online status="OFFLINE" and online status="Online"  for the same device

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban,

ok, but how can the device send a status if it's offline?

if it continue to send logs even if it's offline, you can add this condition to the search, but, as I suppose, it doesnt sends logs when offline, you can use my search.

Ciao.

Giuseppe

0 Karma

parthiban
Path Finder

Hi @gcusello 

This is on premises device and managed by cloud. If device went offline cloud will send log.

 

Which condition I need to add ?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban,

status = "OFFLINE"

please try this:

index=your_index 
| stats count BY device status
| append [ | inputlookup perimeter.csv | eval count=0 | fields device count ]
| stats sum(count) AS total BY device status
| eval status=if(total=0,"down",status)
| search status="recovery" OR status="offline" OR status="down"
| table device status

Ciao.

Giuseppe

0 Karma

parthiban
Path Finder

Hi @gcusello 

| rename "response_details.response_payload.entities{}.onlineStatus" as status
| stats count BY status
| append [ | makeresults | eval name=xxxx, count=0 | fields name ]
| stats sum(count) AS total BY status
| eval status=if(total=0,"OFFLINE",status)
| search status="ONLINE" OR status="OFFLINE"
| table status

I getting result is "ONLINE"

How it will works on the alert ?  How can I set in the alert? Can you please guide me

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban,

probably there's a misundertanding one the condition to check:

I understood that you want to check if status="recovery" or status=down, and I check for these statuses, but what's your requirement?

with your search you check status=down and status=online, is this the requirement?

Ciao.

Giuseppe

0 Karma

parthiban
Path Finder

Hi @gcusello 
Let me clarify,
We receive device status logs every 2 minutes from AWS Cloud. These logs indicate both online and offline statuses. If a device goes offline, we continuously receive offline logs until it comes back online, at which point we receive online logs for that specific device.

My requirement is to trigger a critical alert for the end user when a particular device goes offline. Subsequently, I will notify the end user when the device comes back online. Based I need to create alert. Is this possible?  also I have already shared example logs in this conversation.

Moreover we have this type of alert is working other observability application, now we are migrating to Splunk.

I hope this clarifies my requirement. Please let me know anything required.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban ,

it isn't a problem notification when status is offline but, after the first offline, do you want that the alert continues to fire "offline", or do you want a message when it comes back on line?

 if you want a message every time you have offline and the following online, you could try something like this:

<your_search>
| stats 
   count(eval(status="offline")) AS offline_count
   count(eval(status="online")) AS online_count
   earliest(eval(if(status="offline",_time,""))) AS offline
   earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
   offline_count=0 AND online_count>0,"Online",
   offline_count>0 AND online_count=0,"Offline",
   offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online"),   
   offline_count>0 AND online_count>0 AND online>offline, "Offline"),   
   offline_count=0 AND online_count=0, "No data")
| table condition

in this way you can choose the conditions to trigger the alert.

Ciao.

Giuseppe

0 Karma

parthiban
Path Finder

Hi @gcusello 

No, don't want cont alert for offline... I want to trigger first offline and first online message. Thanks for understanding.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban ,

you have only to setup the conditions for the alert:

<your_search>
| stats 
   count(eval(status="offline")) AS offline_count
   count(eval(status="online")) AS online_count
   earliest(eval(if(status="offline",_time,""))) AS offline
   earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
   offline_count=0 AND online_count>0,"Online",
   offline_count>0 AND online_count=0,"Offline",
   offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online"),   
   offline_count>0 AND online_count>0 AND online>offline, "Offline"),   
   offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition

in this way your alert will trigger the two conditions.

Ciao.

Giuseppe

 

0 Karma

parthiban
Path Finder

Hi @gcusello 

I tried which you given code, it is not working throwing some error.

"Error in 'EvalCommand': Type checking failed. 'AND' only takes boolean arguments"

index="XXXX" 
| rename "response_details.response_payload.entities{}" as status
| where name="YYYY"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online"),
offline_count>0 AND online_count>0 AND online>offline, "Offline"),
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi, sorry, please try this:

index="XXXX" 
| rename "response_details.response_payload.entities{}" as status
| where name="YYYY"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
   offline_count=0 AND online_count>0,"Online",
   offline_count>0 AND online_count=0,"Offline",
   offline_count>0 AND online_count>0 AND online>offline, "Offline but newly 
online",
   offline_count>0 AND online_count>0 AND online>offline, "Offline",
   offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition

Ciao.

Giuseppe

0 Karma

parthiban
Path Finder

HI @gcusello 

This time its runs without error, but no result found.

index="XXXX" "Genesys system is available"
| rename "response_details.response_payload.entities{}.onlineStatus" as status
| where name="YYYY"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly
online",
offline_count>0 AND online_count>0 AND online>offline, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban,

I found an error in the eval definition, but it shouldn't be the issue:

index="XXXX" "Genesys system is available"
| rename "response_details.response_payload.entities{}.onlineStatus" as status
| where name="YYYY"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly
online",
offline_count>0 AND online_count>0 AND offline>online, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition

Debug the search, to understand if the search conditions are verified or not: remove the search statement and see which values you have.

Ciao.

Giuseppe

0 Karma

parthiban
Path Finder

Hi @gcusello 
If I remove the below search condition I get this result.

| search condition="Offline" OR condition="Offline but newly online"
| table condition

 

parthiban_0-1702457118544.png

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban ,

use the correct field for "status" and check if the conditions in the stats command are the correct ones.

Ciao.

Giuseppe

0 Karma

parthiban
Path Finder

Hi @gcusello 

I am using correct field only which is below mentioned one.

| rename "response_details.response_payload.entities{}.onlineStatus" as status

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @parthiban,

if you run:

index="XXXX" "Genesys system is available"
| rename "response_details.response_payload.entities{}.onlineStatus" as status
| where name="YYYY"

which values have you for the status field?

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...