hi All,
i am using below search to get status if any offline
and i want to create alert if status offline for more than 10 mins .
how to modify this search to get if any status is offline more than 10 mins
i am using DB connect to get data for every 5 mins and data will update for every 5 mins in splunk, default is 5 mins to get updated data
below is the data for last 5 mins
index=Testindex sourcetype="Bueprism" source=Botstatus
| table BOT_Name lastupdated BOT_Status _time
| search BOT_Status = Offline
BOT_Name lastupdated BOT_Status
HOUVMITBPRSMX20:8001 | 2023-08-23 05:14:12.503 | Offline | HOUVMITBPRSMX14:8001 | 2023-08-23 08:20:11.77 | Offline | HOUVMITBPRSMX13:8001 | 2023-08-23 08:20:12.693 | Offline |
this is working, what was cron interval can i keep to get alerts as expected
can i run it for every 15 mins
Hi @sekhar463,
if between two Offline it isn't possible to have an Online, you could try something like this:
index=Testindex sourcetype="Bueprism" source=Botstatus BOT_Status="Offline"
| stats
earliest(lastupdated) AS earliest
latest(lastupdated) AS latest
latest(_time) AS _time
BY BOT_Name
| eval latest=if(isnull(latest),_time,latest)
| where latest-earliest>300
if instead you could have an intermediate on line between two offline you could try:
index=Testindex sourcetype="Bueprism" source=Botstatus
| stats
earliest(eval(if(BOT_Status="Offline",lastupdated,""))) AS earliest_offline
latest(eval(if(BOT_Status="Offline",lastupdated,""))) AS latest_offline
values(eval(if(BOT_Status="Online",lastupdated,""))) AS lastupdated_online
latest(_time) AS _time
BY BOT_Name
| eval latest_offline =if(isnull(latest_offline),_time,latest_offline)
| mvexpand lastupdated_online
| where latest_offline-earliest_offline>300 AND NOT (lastupdated_online>earliest_offline AND lastupdated_online<latest_offline)
I'm sure about the first solution, the second one should be tested and eventually adapted.
Ciao.
Giuseppe
Set your search timeframe to the previous 15 minutes.
index=Testindex sourcetype="Bueprism" source=Botstatus
| stats latest(BOT_Status) as BOT_Status latest(lastupdated) as lastupdated by BOT_Name
| where BOT_Status="Offline" AND strptime(lastupdated,"%Y-%m-%d %H:%M:%S") < relative_time(now(), "-10m")
hai what was the cron interval can i keep for alert schedule to get expected
Given that your data changes every 5 minutes, why not schedule the cron for every 5 minutes, but offset it so that it runs just after the new data has been loaded?
BUT THIS SEARCH YOU SAID TO RUN FOR EVERY 15 MINS TO GET IF OFFLINE MORE THAN 10 MINS
SO FOR SCHEDULING ALERT WHAT TIME RANGE CAN I GIVE TO CHECK THE DATA AND CRON SCHEDULE
index=Testindex sourcetype="Bueprism" source=Botstatus | stats latest(BOT_Status) as BOT_Status latest(lastupdated) as lastupdated by BOT_Name | where BOT_Status="Offline" AND strptime(lastupdated,"%Y-%m-%d %H:%M:%S") < relative_time(now(), "-10m")
No, I said the timeframe for the search is the previous 15 minutes, i.e. how far back to look for events, the schedule is how often it runs, which could be every 5 minutes since that's how often the data changes. It is up to you to decide how often it runs as this determines how responsive to detecting the offline time you want to be.
For example, if your search runs as 5 passed the hour and looks back 15 minutes, you will be looking at events from 10 minutes before the hour through to 5 minutes passed the hour. This should give you enough events to be able to detect if the host was down for 10 minutes (during that 15 minute period).
It is entirely up to you to choose how often your alert runs and what time period it searches over.
(I refrained from using caps in my response as I think it is clear enough!)
hai Thanks for your reply.
i have tested for alert using this search but
the data is updating into splunk from source for every 5 mins
and status was in offline when lastupdated="2023-08-24 12:51:49.62"
and status was changed IDLE lastupdated="2023-08-24 13:00:01.637"
how to modify the search if status was offline for 2 polls when collecting data
for example event time 8/24/23 6:25:00.873 PM data collection it was in offline and also next interval 2023-08-24 08:00:02.202 if it was offline , need to get
below are the events while testing for one
8/24/23
6:45:00.920 PM
2023-08-24 08:15:00.920, BOT_Name="HOUVMITBPRSMX21:8001", lastupdated="2023-08-24 13:14:57.803", BOT_Status="Working"
host = TEST BP_Botstatus_Newquery sourcetype = testsoucetype
8/24/23
6:40:01.652 PM
2023-08-24 08:10:01.652, BOT_Name="HOUVMITBPRSMX21:8001", lastupdated="2023-08-24 13:10:00.85", BOT_Status="Working"
host = TEST source = testcource sourcetype = testsoucetype
8/24/23
6:35:00.968 PM
2023-08-24 08:05:00.968, BOT_Name="HOUVMITBPRSMX21:8001", lastupdated="2023-08-24 13:04:59.833", BOT_Status="Working"
host = TEST source = BP_Botstatus_Newquerysourcetype = testsoucetype
8/24/23
6:30:02.202 PM
2023-08-24 08:00:02.202, BOT_Name="HOUVMITBPRSMX21:8001", lastupdated="2023-08-24 13:00:01.637", BOT_Status="Idle"
host = TEST source = testcourcesourcetype =testsoucetype
8/24/23
6:25:00.873 PM
2023-08-24 07:55:00.873, BOT_Name="HOUVMITBPRSMX21:8001", lastupdated="2023-08-24 12:51:49.62", BOT_Status="Offline"
host = TEST source = testcourcesourcetype =testsoucetype
| eval lastoffline=if(BOT_Status="Offline",lastupdated,null())
| stats latest(BOT_Status) as BOT_Status latest(lastupdated) as lastupdated latest(lastoffline) as lastoffline count(lastoffline) as offlinecount by BOT_Name
hi its giving as offline count but how it can help for if any BOT status stays in offline for 10 mins or more.
this is the search results Sshowing only count of offline.
BOT1:8001 | Idle | 2023-08-24 15:30:00.85 | 2023-08-24 11:00:13.787 | 3 |
BOT2:8001 | Idle | 2023-08-24 15:29:56.897 | 2023-08-24 13:20:12.517 | 3 |
BOT3:8001 | Idle | 2023-08-24 15:29:58.693 | 2023-08-24 09:02:12.413 | 4 |
BOT4:8001 | Idle | 2023-08-24 15:30:00.537 | 2023-08-24 08:20:13.363 | 2 |
BOT24:8001 | Idle | 2023-08-24 15:29:59.443 | 2023-08-24 08:20:11.957 | 1 |
BOT15:8001 | Idle | 2023-08-24 15:29:57.43 | 2023-08-24 06:22:22.057 | 5 |
If you are looking back 15 minutes and the status is updated every 5 minutes, then there should only be 3 events per BOT, So if the count is 2 or more the BOT has been offline for at least two of those events. If the middle one is not offline, then it has been offline on two different occasions, otherwise, it has been offline for at least 2 consecutive events.
Hi Thanks.
below are the events for one bot based on the search
it went offline at 9:00 (lastupdated) and if it was offline next run interval also then it will be helpfull to tigger alerts
8/25/23
2:50:01.811 PM
2023-08-25 04:20:01.811, BOT_Name="HOUVMITBPRSMX10:8001", lastupdated="2023-08-25 09:19:59.597", BOT_Status="Working"
host = TEST source = testcourcesourcetype =testsoucetype linecount = 1punct = --_::.,_=":",_="--_::.",_=""source = BP_Botstatus_Newquerysourcetype = db:blueprism_bot statussplunk_server = idx-i-0ad81b0fe967c9831.invesco.splunkcloud.com
8/25/23
2:45:01.003 PM
2023-08-25 04:15:01.003, BOT_Name="HOUVMITBPRSMX10:8001", lastupdated="2023-08-25 09:14:59.183", BOT_Status="Working"
host = TEST source = testcourcesourcetype =testsoucetype linecount = 1punct = --_::.,_=":",_="--_::.",_=""source = BP_Botstatus_Newquerysourcetype = db:blueprism_bot statussplunk_server = idx-i-06bd71cc08ca3fde6.invesco.splunkcloud.com
8/25/23
2:40:02.103 PM
2023-08-25 04:10:02.103, BOT_Name="HOUVMITBPRSMX10:8001", lastupdated="2023-08-25 09:09:59.897", BOT_Status="Idle"
host = TEST source = testcourcesourcetype =testsoucetype linecount = 1punct = --_::.,_=":",_="--_::.",_=""source = BP_Botstatus_Newquerysourcetype = db:blueprism_bot statussplunk_server = idx-i-0b3fd3ab5272edbd5.invesco.splunkcloud.com
8/25/23
2:35:00.976 PM
2023-08-25 04:05:00.976, BOT_Name="HOUVMITBPRSMX10:8001", lastupdated="2023-08-25 09:00:13.993", BOT_Status="Offline"
host = TEST source = testcourcesourcetype =testsoucetype linecount = 1punct = --_::.,_=":",_="--_::.",_=""source = BP_Botstatus_Newquery
BOT_Name BOT_Status lastupdated lastoffline offlinecount
HOUVMITBPRSMX10:8001 | Working | 2023-08-25 09:14:59.183 | 2023-08-25 09:00:13.993 | 1 |
So, you are looking back more than 15 minutes and you want to ignore idle status, i.e. you want the different between when the status is "Working" and "Offline"?
| eval lastupdated=strptime(lastupdated,"%F %T.%3N")
| eval offlinetime=if(BOT_Status="Offline",lastupdated,null())
| eval workingtime=if(BOT_Status="Working",lastupdated,null())
| streamstats last(workingtime) as nextworkingtime
| where nextworkingtime - lastupdated > 600
no as per events i have if any BOT was offline for 2 consecutive times , which data was updated in for every 5 mins so for example if when got data for first run WHENEVER the BOT IS IN OFFLINE AND NEXT TIME ALSO WHEN DATA IS UPDATED SAME BOT IS OFFLINE THEN I WANT TO GET THOSE
There is no need for caps! Your sample data does not show this situation. There is only one event with an offline status. Good luck.
how can i achieve to get offline status if more than 10 mins .
because might be status will change for next update during data collection, if bot still in offline for 2 interval of time then need to get those results
this is working, what was cron interval can i keep to get alerts as expected
can i run it for every 15 mins
and @gcusello also provided the same query but not giving any results not sure why
ndex=Testindex sourcetype="Bueprism" source=Botstatus BOT_Status="Offline" | stats earliest(lastupdated) AS earliest latest(lastupdated) AS latest latest(_time) AS _time BY BOT_Name | eval latest=if(isnull(latest),_time,latest) | where latest-earliest>300
Hi @sekhar463,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉