Splunk Search

Splunk query that alert if services on a Jboss server went down

shakeel253
Explorer

We have multiple aa-dev-server that are running jboss, below query sends me alert when jboss service is down. The issue is, I have a limitation on my splunk account where i am limited to only few real time alert. Is there a query where if i could use the "" it picks all host="aa-dev-jboss-" but send email and specifies which host jboss server went down or provide a table which server did

host="aa-dev-jboss-1" source=ps jboss | stats latest(_time) as latest by host

0 Karma

nickhills
Ultra Champion

Hi - I added this post - If you find it useful, please upvote the answer, or add your own solution if you found another way!

https://answers.splunk.com/answers/606762/how-do-i-monitor-jbosstomcatapacheetc-and-raise-an.html

If my comment helps, please give it a thumbs up!
0 Karma

niketnilay
Legend

@shakeel253, you need not run a real-time alert as the same can be based on SLA defined at Enterprise level.

For example you can run the alert every 5 minutes for last 15 minutes to check host ping status. Since you seem to base your query on host, you can use use tstats or metadata command to write faster search for the same. Also, for the query above if a host has no events at all for the time period you are searching then the host will not be reported. So you would need to have a lookup file in Splunk with all available host names. You can have static lookup file or have a scheduled Splunk search with outputlookup command to write available hosts to lookup file. PS: You can also use Splunk REST service call to get a list of all hosts which are pinging your Splunk Server. Assuming your host lookup file is available_jboss_hosts.csv with host field name, you can try a query like the following:

| tstats latest(_time) as _time WHERE index=<yourIndexName> BY host
| eval "downTime (in Min)"=round((now()-_time)/60,0)
| appendpipe [
    | inputlookup available_jboss_hosts.csv
    | fields host
    | eval "downTime (in Min)"="999"
       ]
| dedup host
| where 'downTime (in Min)'>5
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

fsrodriguez
New Member

But this query is pinging the hosts. I think Shakeel needs a query that checks if the jboss service is down.

0 Karma

shakeel253
Explorer

@niketnilay Exactly what fsrodriguez mentioned, that is what i need.

0 Karma

niketnilay
Legend

@shakeel253, I based it off your query which you mentioned in the question that works fine for individual JBOSS server -> below query sends me alert when jboss service is down

Nevertheless. Which OS is the JBOSS server running on Windows or Linux. Usually on windows JBOSS service start and stop is logged in EventViewer, is it so? if not do you have explicit JBOSS logs that can be used instead?

If you can provide the logs or place which you use to identify JBOSS service down, the same can be plugged in to the alert.

First let me change answer to comment, because seems like your query in the question does not seem to do what you expect.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

shakeel253
Explorer

The OS we are using is Linux (Amazon Linux EC2 instance or Redhat). We use Jboss server log to identify if its starting or shutdown, the absolute path for that is /opt/jboss-eap/standalone/log/server.log.

Another way we check if Jboss service is running by checking Jboss pid

ps aux | grep jboss

0 Karma

niketnilay
Legend

Then instead of WHERE index=<yourIndexName> in tstats, use sourcetype for JBOSS if you have kept one. Otherwise use source i.e.

WHERE source="/opt/jboss-eap/standalone/log/server.log"

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

shakeel253
Explorer

two issues when i changed the source and ran the query

1) It is picking jboss server for other environment, for example, i need jboss for ABC environment but not for DEF environment, but its picking up all the server from ABC environment and DEF environment.

2)The second issue is that the query should only give me result when it detects is the jboss server is down, but it is still showing me result

0 Karma

niketnilay
Legend

@shakeel253, have you tried the following:

 | tstats latest(_time) as _time WHERE (host="ABC1" OR host="ABC2") AND source="/opt/jboss-eap/standalone/log/server.log" by host
 | eval "downTime (in Min)"=round((now()-_time)/60,0)
 | appendpipe [
     | inputlookup available_jboss_hosts.csv
     | fields host
     | eval "downTime (in Min)"="999"
        ]
 | dedup host
 | where 'downTime (in Min)'>5
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

shakeel253
Explorer

@niketnilay so i have my tomcat server running and when i ran below query, i get output with the host name, current date and time as well as downtime 999. Is that the expected output?
I was under the impression if the tomcat/jboss services are running then there would be no output since there is no downtime for those services. We should only get result when downtime (jboss/tomcat) is more then 5 min

| tstats latest(_time) as _time WHERE (host="hostsvm") AND source="/opt/tomcat/logs/catalina.out" by host
| eval "downTime (in Min)"=round((now()-_time)/60,0)
| append [
| makeresults
| eval host="hostsvm", "downTime (in Min)"="999"]
| dedup host
| where 'downTime (in Min)'>5

0 Karma

niketnilay
Legend

Have you done | dedup host?
Idea is to have one row for each host, if the query does not return any rows in such case only one row with downTIme (in Min)=999 will be present. By doing a dedup host we retain only one row per host.

The final filter | where 'downTime (in Min)'>5 retains all records with events received older than 5 min and also the hosts which did not have any events for the selected time period (indicated by 999). You can also set it to any other default value if it is confusing i.e. "5+" indicating host/s that did not have an event for last more than 5 minutes based on selected timerange.

Hope this clarifies.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

shakeel253
Explorer

hey Niketniley, looks the the query will do the job, but i am getting below mentioned error. I tried putting a space between search "downTime (in Min)">5 but its not helping

Comparator '>' is missing a term on the right hand side
The search job has failed due to an error. You may be able view the job in the Job Inspector.

0 Karma

niketnilay
Legend

@shakeel253, my bad I had used search in place of where. Replaced double quotes with single quotes as well. Please try again and confirm.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!