Splunk Search

How to search if a component has been down for some time and the event that it is up hasn't appeared for that period ?

alexeyglukhov
Path Finder

Hello all !
The task is to alert if a component (pool) is down for more than 10 minutes.

Some details:
There are down and up events for many pools (entry examples are below).
So firstly I started using a transaction to find those pairs of events and compare the duration to the threshold - works perfectly (the search string is below).

| transaction myvip pool_name_and_port mcpd_code
startswith=("component status down")
endswith=("component status up")
| where duration>600

But the problem is in order transaction to appear in the alert I always need the second message "up" (and delays can be hours/days), so it doesn't help to generate alerts immediately after the pool was down for more than 10 minutes.

So, I need your help to detect that a component was down for more than some period of time (10 minutes in my case) and there is no "component status up" event in that period of time.

Thank you in advance for any help !

Entry examples:
Feb 26 21:11:23 log_file_location_hostname_1 notice mcpd[9999]: 01070727:5: Pool /Common/myvip1/appname_pool member /Common/myhost1:10003 component status up. [ /Common/myvip1/appname_https_component: up ] [ was down for 0hr:15mins:39sec ]
Feb 26 21:11:22 log_file_location_hostname_2 notice mcpd[8888]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10003 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:15mins:35sec ]
Feb 26 21:11:22 log_file_location_hostname_2 notice mcpd[8888]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10003 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:15mins:35sec ]
Feb 26 21:11:21 log_file_location_hostname_3 notice mcpd[7777]: 01070727:5: Pool /Common/myvip1/appname_pool member /Common/myhost3:10003 component status up. [ /Common/myvip1/appname_https_component: up ] [ was down for 0hr:15mins:34sec ]
Feb 26 21:11:21 log_file_location_hostname_3 notice mcpd[7777]: 01070727:5: Pool /Common/myvip1/appname_pool member /Common/myhost3:10003 component status up. [ /Common/myvip1/appname_https_component: up ] [ was down for 0hr:15mins:34sec ]
Feb 26 21:00:37 log_file_location_hostname_1 notice mcpd[9999]: 01070727:5: Pool /Common/myvip1/appname_pool member /Common/myhost3:10004 component status up. [ /Common/myvip1/appname_https_component: up ] [ was down for 0hr:17mins:45sec ]
Feb 26 21:00:37 log_file_location_hostname_4 notice mcpd[6666]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10004 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:17mins:45sec ]
Feb 26 21:00:37 log_file_location_hostname_4 notice mcpd[6666]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10004 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:17mins:45sec ]
Feb 26 21:00:36 log_file_location_hostname_1 notice mcpd[9999]: 01070727:5: Pool /Common/myvip1/appname_pool member /Common/myhost1:10004 component status up. [ /Common/myvip1/appname_https_component: up ] [ was down for 0hr:17mins:40sec ]
Feb 26 21:00:36 log_file_location_hostname_1 notice mcpd[9999]: 01070727:5: Pool /Common/myvip1/appname_pool member /Common/myhost1:10004 component status up. [ /Common/myvip1/appname_https_component: up ] [ was down for 0hr:17mins:40sec ]
Feb 26 21:00:35 log_file_location_hostname_3 notice mcpd[7777]: 01070727:5: Pool /Common/myvip1/appname_pool member /Common/myhost1:10004 component status up. [ /Common/myvip1/appname_https_component: up ] [ was down for 0hr:17mins:39sec ]
Feb 26 21:00:35 log_file_location_hostname_2 notice mcpd[8888]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost4:10004 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:17mins:39sec ]
Feb 26 21:00:35 log_file_location_hostname_2 notice mcpd[8888]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost4:10004 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:17mins:39sec ]
Feb 26 21:00:34 log_file_location_hostname_3 notice mcpd[7777]: 01070727:5: Pool /Common/myvip1/appname_pool member /Common/myhost3:10004 component status up. [ /Common/myvip1/appname_https_component: up ] [ was down for 0hr:17mins:39sec ]
Feb 26 21:00:34 log_file_location_hostname_4 notice mcpd[6666]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost4:10004 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:17mins:40sec ]
Feb 26 21:00:33 log_file_location_hostname_2 notice mcpd[8888]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10004 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:17mins:39sec ]
Feb 26 21:00:33 log_file_location_hostname_2 notice mcpd[8888]: 01070727:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10004 component status up. [ /Common/myvip2/appname_https_component: up ] [ was down for 0hr:17mins:39sec ]
Feb 26 20:55:47 log_file_location_hostname_2 notice mcpd[8888]: 01070638:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10003 component status down. [ /Common/myvip2/appname_https_component: down ] [ was up for 0hr:1min:26sec ]
Feb 26 20:55:47 log_file_location_hostname_3 notice mcpd[7777]: 01070638:5: Pool /Common/myvip1/appname_pool member /Common/myhost3:10003 component status down. [ /Common/myvip1/appname_https_component: down ] [ was up for 0hr:1min:26sec ]
Feb 26 20:55:47 log_file_location_hostname_2 notice mcpd[8888]: 01070638:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10003 component status down. [ /Common/myvip2/appname_https_component: down ] [ was up for 0hr:1min:26sec ]
Feb 26 20:55:47 log_file_location_hostname_3 notice mcpd[7777]: 01070638:5: Pool /Common/myvip1/appname_pool member /Common/myhost3:10003 component status down. [ /Common/myvip1/appname_https_component: down ] [ was up for 0hr:1min:26sec ]
Feb 26 20:55:46 log_file_location_hostname_4 notice mcpd[6666]: 01070638:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10003 component status down. [ /Common/myvip2/appname_https_component: down ] [ was up for 0hr:1min:26sec ]
Feb 26 20:55:46 log_file_location_hostname_4 notice mcpd[6666]: 01070638:5: Pool /Common/myvip2/appname_pool member /Common/myhost2:10003 component status down. [ /Common/myvip2/appname_https_component: down ] [ was up for 0hr:1min:26sec ]
Feb 26 20:55:45 log_file_location_hostname_1 notice mcpd[9999]: 01070638:5: Pool /Common/myvip1/appname_pool member /Common/myhost3:10003 component status down. [ /Common/myvip1/appname_https_component: down ] [ was up for 0hr:1min:22sec ]
Feb 26 20:55:45 log_file_location_hostname_4 notice mcpd[6666]: 01070638:5: Pool /Common/myvip2/appname_pool member /Common/myhost4:10003 component status down. [ /Common/myvip2/appname_https_component: down ] [ was up for 0hr:1min:21sec ]
Feb 26 20:55:45 log_file_location_hostname_2 notice mcpd[8888]: 01070638:5: Pool /Common/myvip2/appname_pool member /Common/myhost4:10003 component status down. [ /Common/myvip2/appname_https_component: down ] [ was up for 0hr:1min:21sec ]
Feb 26 20:55:45 log_file_location_hostname_4 notice mcpd[6666]: 01070638:5: Pool /Common/myvip2/appname_pool member /Common/myhost4:10003 component status down. [ /Common/myvip2/appname_https_component: down ] [ was up for 0hr:1min:21sec ]
Feb 26 20:55:45 log_file_location_hostname_2 notice mcpd[8888]: 01070638:5: Pool /Common/myvip2/appname_pool member /Common/myhost4:10003 component status down. [ /Common/myvip2/appname_https_component: down ] [ was up for 0hr:1min:21sec ]
Feb 26 20:55:45 log_file_location_hostname_1 notice mcpd[9999]: 01070638:5: Pool /Common/myvip1/appname_pool member /Common/myhost3:10003 component status down. [ /Common/myvip1/appname_https_component: down ] [ was up for 0hr:1min:22sec ]
Feb 26 20:55:44 log_file_location_hostname_3 notice mcpd[7777]: 01070638:5: Pool /Common/myvip1/appname_pool member /Common/myhost1:10003 component status down. [ /Common/myvip1/appname_https_component: down ] [ was up for 0hr:1min:21sec ]
Feb 26 20:55:44 log_file_location_hostname_1 notice mcpd[9999]: 01070638:5: Pool /Common/myvip1/appname_pool member /Common/myhost1:10003 component status down. [ /Common/myvip1/appname_https_component: down ] [ was up for 0hr:1min:22sec ]
Feb 26 20:55:44 log_file_location_hostname_3 notice mcpd[7777]: 01070638:5: Pool /Common/myvip1/appname_pool member /Common/myhost1:10003 component status down. [ /Common/myvip1/appname_https_component: down ] [ was up for 0hr:1min:21sec ]
Feb 26 20:55:44 log_file_location_hostname_1 notice mcpd[9999]: 01070638:5: Pool /Common/myvip1/appname_pool member /Common/myhost1:10003 component status down. [ /Common/myvip1/appname_https_component: down ] [ was up for 0hr:1min:22sec ]

Tags (2)
0 Karma
1 Solution

FrankVl
Ultra Champion

What happens when you simply remove the endswith=("component status up") part from your original query and adjust your where clause to either look for durations >600, or transactions without an "UP" event that are of a certain age?

View solution in original post

FrankVl
Ultra Champion

What happens when you simply remove the endswith=("component status up") part from your original query and adjust your where clause to either look for durations >600, or transactions without an "UP" event that are of a certain age?

alexeyglukhov
Path Finder

Hi Frank,

Thank you very much for your suggestion - it works.
I just recently started playing with Splunk and didn't know that "endwith=" is not a mandatory parameter for transaction.

So, I modified the string as you pointed:

| transaction myvip pool_name_and_port mcpd_code
startswith=("component status down")
| eval is_up_event_found=if(like(_raw, "%component status up%"), "yes", "no")
| where duration>600 OR is_up_event_found="no"

Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...