My requirements consists of lookup file, it consists of list of hosts, as it is the saved results of an alert, so the list of host is the list of server down list.
So by using the lookup file to make alert to run for every minute, it should notify when the host in lookup is back to normal, The problem that i'm having is, once the host is back to normal, then the same host should not be considered further. Only should check with remaining hosts.
lookup file that stores list of server down - hostdown.csv
Query to find list of down servers
| search index=linux sourcetype=df | where ((PercentUsedSpace >= 80) AND (PercentUsedSpace<=90))
You can change the content of a lookup with the outputlookup command - however, in your case, I am not sure doing this every minute is necessarily a good thing as the lookups have to be sync'd across the servers.
A question you might push back with to your customer is, how soon after the change do they actually need to know, for example, if you only checked every 5 minutes for the past 5 minutes, would that be good enough for your alert? How about every 10 or 20 minutes? Do your users want to be alerted at all times of the day, that the server is now up, even if it is bounced 10 times overnight (for some reason)?
Is it worth considering using a summary index to track when servers change state and only add to the index if it is different from the last state in the summary index for that server or a new day? The new day would mean that your summary index will grow but it also means you don't have to search it for all time to find the last entry. The growth can be managed with retention periods for the summary index. (This is how I might have approached this problem.)
sorry but I'm having some proble to understand you need:
you have a list of serve down and you want to check if it came up, is it correct?
Simply you could run again the alert and match results,
so if the search to find down servers is the one you shared, and you scheduled it saving results in a lookup ( hostdown.csv) you could run something like this to have the status of all hosts:
index=linux sourcetype=df ((PercentUsedSpace >= 80) AND (PercentUsedSpace<=90)) | host=lower(host) | stats count BY host | append [ | inpoutlookup hostdown.csv | eval host=lower(host), count=0 | fields host count ] | stats sum(count) AS total BY host | eval status=if(total=0,"Down","Up")
If instead you want to check the status of only the before down hosts, you could use this:
index=linux sourcetype=df ((PercentUsedSpace >= 80) AND (PercentUsedSpace<=90)) [ | inpoutlookup hostdown.csv | fields host ] | host=lower(host) | stats count BY host | append [ | inpoutlookup hostdown.csv | eval host=lower(host), count=0 | fields host count ] | stats sum(count) AS total BY host | eval status=if(total=0,"Down","Up")
Hi @gcusello ,
Thanks for your response,
I'm having a list of serve down and need to notify once its back to normal (up), This is the requirement,
once the server is up, no need to consider the same server further, because its already up , need to check the remaining.
Eg.., There are servers A,B,C,D and E are down, which will be there in lookup,
Need to check those server every minute and notify once its up,
if server A,B is up after some time, then it should trigger an alert, already server A,B is up, and after next alert, server A,B should not be considered, only remaining servers like C,D and E should be considered further, Then it check and trigger alert when C, D and E or either one is up.
| lookup Hobbit_threshold_data host mount outputnew l_threshold as lower_value h_threshold as higher_value condition as Condition
| where ((PercentUsedSpace >= lower_value) AND (PercentUsedSpace<higher_value))
| where Condition!="no"
| eval hostname=mvindex(split(host,"."),0) [ | inputlookup Hobbit_Disk_Space_Warning.csv | fields host ]
| stats host=lower(host)
| stats count BY host
| append [ | inputlookup KCI_Hobbit_Disk_Space_Warning.csv | eval host=lower(host), count=0 | fields host count ]
| stats sum(count) AS total BY host
| eval status=if(total=0,"Down","Up")
(Bolded query part gives you server down list host)
I just modified as per the query given by you, It does not meet the requirement.