We have multiple aa-dev-server that are running jboss, below query sends me alert when jboss service is down. The issue is, I have a limitation on my splunk account where i am limited to only few real time alert. Is there a query where if i could use the "" it picks all host="aa-dev-jboss-" but send email and specifies which host jboss server went down or provide a table which server did
host="aa-dev-jboss-1" source=ps jboss | stats latest(_time) as latest by host
Hi - I added this post - If you find it useful, please upvote the answer, or add your own solution if you found another way!
@shakeel253, you need not run a real-time alert as the same can be based on SLA defined at Enterprise level.
For example you can run the alert every 5 minutes for last 15 minutes to check host ping status. Since you seem to base your query on
host, you can use use
metadata command to write faster search for the same. Also, for the query above if a host has no events at all for the time period you are searching then the host will not be reported. So you would need to have a
lookup file in Splunk with all available host names. You can have static lookup file or have a scheduled Splunk search with
outputlookup command to write available hosts to lookup file. PS: You can also use Splunk REST service call to get a list of all hosts which are pinging your Splunk Server. Assuming your host lookup file is
host field name, you can try a query like the following:
| tstats latest(_time) as _time WHERE index=<yourIndexName> BY host | eval "downTime (in Min)"=round((now()-_time)/60,0) | appendpipe [ | inputlookup available_jboss_hosts.csv | fields host | eval "downTime (in Min)"="999" ] | dedup host | where 'downTime (in Min)'>5
@shakeel253, I based it off your query which you mentioned in the question that works fine for individual JBOSS server ->
below query sends me alert when jboss service is down
Nevertheless. Which OS is the JBOSS server running on Windows or Linux. Usually on windows JBOSS service start and stop is logged in EventViewer, is it so? if not do you have explicit JBOSS logs that can be used instead?
If you can provide the logs or place which you use to identify JBOSS service down, the same can be plugged in to the alert.
First let me change answer to comment, because seems like your query in the question does not seem to do what you expect.
The OS we are using is Linux (Amazon Linux EC2 instance or Redhat). We use Jboss server log to identify if its starting or shutdown, the absolute path for that is /opt/jboss-eap/standalone/log/server.log.
Another way we check if Jboss service is running by checking Jboss pid
ps aux | grep jboss
Then instead of WHERE
sourcetype for JBOSS if you have kept one. Otherwise use
two issues when i changed the source and ran the query
1) It is picking jboss server for other environment, for example, i need jboss for ABC environment but not for DEF environment, but its picking up all the server from ABC environment and DEF environment.
2)The second issue is that the query should only give me result when it detects is the jboss server is down, but it is still showing me result
@shakeel253, have you tried the following:
| tstats latest(_time) as _time WHERE (host="ABC1" OR host="ABC2") AND source="/opt/jboss-eap/standalone/log/server.log" by host | eval "downTime (in Min)"=round((now()-_time)/60,0) | appendpipe [ | inputlookup available_jboss_hosts.csv | fields host | eval "downTime (in Min)"="999" ] | dedup host | where 'downTime (in Min)'>5
@niketnilay so i have my tomcat server running and when i ran below query, i get output with the host name, current date and time as well as downtime 999. Is that the expected output?
I was under the impression if the tomcat/jboss services are running then there would be no output since there is no downtime for those services. We should only get result when downtime (jboss/tomcat) is more then 5 min
| tstats latest(_time) as _time WHERE (host="hostsvm") AND source="/opt/tomcat/logs/catalina.out" by host
| eval "downTime (in Min)"=round((now()-_time)/60,0)
| append [
| eval host="hostsvm", "downTime (in Min)"="999"]
| dedup host
| where 'downTime (in Min)'>5
Have you done
| dedup host?
Idea is to have one row for each host, if the query does not return any rows in such case only one row with
downTIme (in Min)=999 will be present. By doing a
dedup host we retain only one row per host.
The final filter
| where 'downTime (in Min)'>5 retains all records with events received older than 5 min and also the hosts which did not have any events for the selected time period (indicated by
999). You can also set it to any other default value if it is confusing i.e. "5+" indicating host/s that did not have an event for last more than 5 minutes based on selected timerange.
Hope this clarifies.
hey Niketniley, looks the the query will do the job, but i am getting below mentioned error. I tried putting a space between search "downTime (in Min)">5 but its not helping
Comparator '>' is missing a term on the right hand side
The search job has failed due to an error. You may be able view the job in the Job Inspector.
@shakeel253, my bad I had used
search in place of
where. Replaced double quotes with single quotes as well. Please try again and confirm.