Hi everyone
We have an on-premise edge device in the remote location, and it is added to the cloud. I would like to monitor and set an alert for both device offline and recovery statuses.
While I can set an alert for the offline status, I'm a bit confused about including the recovery status. Can you please assist me in configuring the alert for both scenarios?
Hi @parthiban.
if you have as results of your search : onlineStatus="online" and/or onlineStatus=offline, you could modify your search in this way:
index= "XXXXX" "Genesys system is available"
| spath input=_raw output=new_field path=response_details.response_payload.entities{}
| mvexpand new_field
| fields new_field
| spath input=new_field output=serialNumber path=serialNumber
| spath input=new_field output=onlineStatus path=onlineStatus
| where serialNumber!=""
| lookup Genesys_Monitoring.csv serialNumber
| where Country="Bangladesh"
| stats
count(eval(onlineStatus="offline")) AS offline_count
count(eval(onlineStatus="online")) AS online_count
earliest(eval(if(onlineStatus="offline",_time,""))) AS offline_time
earliest(eval(if(onlineStatus="online",_time,""))) AS online_time
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online",
offline_count>0 AND online_count>0 AND offline>online, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
Ciao.
Giuseppe
Hi @parthiban,
sorry I wasn't clear: running the last reduced search, which values (not field names) have you for the "status" field?
have you the waited values: "offline", "online", etc... that we use for the checks?
Ciao.
Giuseppe
Hi @gcusello
I've shared an example Splunk payload. In that, we have the 'onlinestatus' field under 'response details,' 'response payload,' and 'entities.' First, we need to extract the 'onlinestatus' and serial number (for identifying the device) before applying the condition for the alert right?
Hi @parthiban,
the problem are the starting data:
viewing your data without any transformation, it seems that you haven't the data: so reducing the search without | where name="YYYY" have you status?
index="XXXX" "Genesys system is available"
| rename "response_details.response_payload.entities{}.onlineStatus" as status
if not you have to redesign your search because it isn't congruent.
Ciao.
Giuseppe
HI @gcusello
I have previously mentioned that we receive 13 device statuses in a single payload. I am attempting to set up an alert for each device. However, the current query, where I extract a single device serial number, is not functioning as expected and the alert condition also not working. Could you please check.
index= "YYYYYYY" "Genesys system is available" response_details.response_payload.entities{}.onlineStatus="*" response_details.response_payload.entities{}.serialNumber="*"
| rename "response_details.response_payload.entities{}.onlineStatus" as status
| rename "response_details.response_payload.entities{}.serialNumber" as SerialNumber
| where SerialNumber="XXXXXX"
| stats
count(eval(status="offline")) AS offline_count
count(eval(status="online")) AS online_count
earliest(eval(if(status="offline",_time,""))) AS offline
earliest(eval(if(status="online",_time,""))) AS online
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly
online",
offline_count>0 AND online_count>0 AND offline>online, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
Hi @parthiban,
logic and syntax of this search is correct, if it doesn't give results, you have to check the data that maybe are different than we supposed creating the search, and I cannot help you because I haven't your data.
Debug it removing, one by one, the rows and identifying where is the issue: I suppose that's in the field names or in the field values.
Ciao.
Giuseppe
Hi @gcusello
The below code I am getting correct results, Could you please add alert condition in that.
index= "XXXXX" "Genesys system is available"
| spath input=_raw output=new_field path=response_details.response_payload.entities{}
| mvexpand new_field
| fields new_field
| spath input=new_field output=serialNumber path=serialNumber
| spath input=new_field output=onlineStatus path=onlineStatus
| where serialNumber!=""
| lookup Genesys_Monitoring.csv serialNumber
| where Country="Bangladesh"
| stats count by onlineStatus, Country
Hi @parthiban.
if you have as results of your search : onlineStatus="online" and/or onlineStatus=offline, you could modify your search in this way:
index= "XXXXX" "Genesys system is available"
| spath input=_raw output=new_field path=response_details.response_payload.entities{}
| mvexpand new_field
| fields new_field
| spath input=new_field output=serialNumber path=serialNumber
| spath input=new_field output=onlineStatus path=onlineStatus
| where serialNumber!=""
| lookup Genesys_Monitoring.csv serialNumber
| where Country="Bangladesh"
| stats
count(eval(onlineStatus="offline")) AS offline_count
count(eval(onlineStatus="online")) AS online_count
earliest(eval(if(onlineStatus="offline",_time,""))) AS offline_time
earliest(eval(if(onlineStatus="online",_time,""))) AS online_time
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online",
offline_count>0 AND online_count>0 AND offline>online, "Offline",
offline_count=0 AND online_count=0, "No data")
| search condition="Offline" OR condition="Offline but newly online"
| table condition
Ciao.
Giuseppe
Hi @gcusello
The given query is working in some scenarios only. It works only when the online message condition is present. However, when both online and offline messages are present, the condition does not work. I have shared a screenshot for reference.
FYI: I delete the below search in my query, because it is not working.
| search condition="Offline" OR condition="Offline but newly online" | table condition
working scenario
Non-working scenario
We can remove offline_time and online_time, this is not required.
Hi @parthiban,
ok, adapt my hint to your requirement.
let me know if I can help you more, or, please, accept one answer for the other people of Community.
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated 😉
Hi @gcusello Thanks for your support.
Final question How Can I implement this query into alert. Please suggest me.
Hi @parthiban,
run this search in the search dashboard of the app where you want to store your alert.
Be sure to use the correct time period.
Then save it as an alert, adding the information for alert execution (scheduling) and actions (email or other).
Ciao.
Giuseppe
P.S.: Karma Points are appreciated 😉
Hi @gcusello
I already mention right If condition is "Offline" alert mail need to be sent (only one alert, rest all alert need to be suppressed.) If condition is become "online" alert needs to be sent (only one alert, rest all alert need to be suppressed.)
This search will run every 5 min and search the result for the past 5 min.
This is my requirement... Please guide me.
Hi @parthiban ,
thisis the procedure:
Ciao.
Giuseppe
Hi @gcusello
If we configure it like this, for example, if the device goes OFFLINE for the next one hour, will we receive an alert every 5 minutes? If yes, that is not fulfills my requirement; I only want the notification to be sent once. The same applies for the ONLINE condition as well.
If it is not possible in a single search, we can split it into two different searches: one for the OFFLINE condition alert and another for the ONLINE condition alert. Is this possible?
Hi @parthiban,
if you don't want a new alert triggered for an hour after a triggered alert, you have to enable "Throttle".
Ciao.
Giuseppe
Hi @gcusello
I don't think my point was clear. This pertains to heartbeat monitoring for a specific device. When the device goes offline, we cannot predict when it will come online. In this case, how do we set the throttle time?
Hi @parthiban,
you have two solutions:
The first is an easier solution, that could also be interesting to be sure not forgetting the status.
The second is just a little more complicated.
Ciao.
Giuseppe
Hi @gcusello
I understood your first point. However, in our case, we don't want that type of requirement. Once we identify the offline message, we don't want to receive repeated alerts because we have already created a ticket for that incident. We only need to be notified once the device becomes online.
Throttling won't work for our requirement I believe, as we already have a similar alert mechanism in our observability tools. We aim to implement the same in Splunk. For better understanding, I've provided an example. Please guide me anyway to achieve this requirement.
1st search - OFFLINE - alert will trigger
2nd search - OFFLINE - alert suppressed
3rd search - OFFLINE - alert suppressed
4th search - ONLINE - alert will trigger
5th search - ONLINE - alert suppressed
6th search - OFFLINE - alert will trigger
7th search - OFFLINE - alert suppressed
8th search - OFFLINE - alert suppressed
9th search - ONLINE - alert will trigger
10th search - ONLINE - alert suppressed
.
. etc
Hi @parthiban,
ok, you could use this approach:
create an alert that doesn't send an email running this search:
index= "XXXXX" "Genesys system is available"
| spath input=_raw output=new_field path=response_details.response_payload.entities{}
| mvexpand new_field
| fields new_field
| spath input=new_field output=serialNumber path=serialNumber
| spath input=new_field output=onlineStatus path=onlineStatus
| where serialNumber!=""
| lookup Genesys_Monitoring.csv serialNumber
| where Country="Bangladesh"
| stats
count(eval(onlineStatus="offline")) AS offline_count
count(eval(onlineStatus="online")) AS online_count
earliest(eval(if(onlineStatus="offline",_time,""))) AS offline_time
earliest(eval(if(onlineStatus="online",_time,""))) AS online_time
| fillnull value=0 offline_count
| fillnull value=0 online_count
| eval condition=case(
offline_count=0 AND online_count>0,"Online",
offline_count>0 AND online_count=0,"Offline",
offline_count>0 AND online_count>0 AND online>offline, "Offline but newly online",
offline_count>0 AND online_count>0 AND offline>online, "Offline",
offline_count=0 AND online_count=0, "No data"),
search="Device went offline and recovery status"
| search condition="Offline" OR condition="Online" OR condition="Offline but newly online"
| table search condition
| collect index=summary
then you can run a search like the following:
index=summary search="Device went offline and recovery status"
| stats
dc(condition) AS condition_count
last(condition) AS condition_last
values(condition) AS condition
| search
(condition_last="Offline" condition_count=1) OR
(condition_last="Online" condition_count>1)
with an email action to informa yo that there's a new offline or there's an online adter te offline.
Please check the conditions because I cannot do, but they should be correct.
Ciao.
Giuseppe