Deployment Architecture

How to monitor a process (rhnsd) on all Linux servers via a pie chart?

Explorer

How to monitor a process (rhnsd) on all Linux servers via a pie chart?

For example, if on any server the rhnsd is running I get green on the pie chart, and if on any server it's stopped, it will show up in red.

Labels (2)
0 Karma

New Member

Hi -

I need to create an alert where if a process is not running in a linux server , then it should send out an alert :

Below query is giving me correct results of all the processess running in a server :

index="index-name" source=ps host="hostname" process="processname"
| dedup host process
| join host [search index="index-name" source=ps host="hostname
" process="*process
name*"
| stats latest(host) latest(time) by host |eval lastSeen='latest(time)'|fields host lastSeen]
|eval status=if(lastSeen<(_time - 300), "not running","running")
|table host status process

Example Output :

Host : hostname
Status : running
process : process_name

But i need to send an alert if the status is not running

Could anybody help me with it

0 Karma

Ultra Champion

I am not sure a pie chart is the best visualisation for what you are asking.
Instead, here is a way to do something similar with Single values.

if you have the SplunkTAnix deployed, you will want to enable ps.sh
In your inputs make sure you have

[script://./bin/ps.sh]
disabled = false

You will then be collecting events from your forwarders, and can run queries like this:

index=os sourcetype=ps rhnsd
You will get results for every system which has the process running.

Now here is the tricky bit - do you know how many hosts should be running the process?
-or-
Do you just want to see if a host which was previously running it has stopped?

If you know there should be 10 hosts:

index=os sourcetype=ps rhnsd|dedup host|stats count as runningCount|eval rhnsdMissing=(10-runningCount)| table rhnsdMissing

you can then colour code the single value pane as appropriate.

0 Karma

Explorer

Its difficult to track the exact number of hosts as hosts gets provisioned and decommissioned continuously.
I want to track if rhnsd running on any host has stopped or not, because its expected to run on all hosts.

0 Karma

Ultra Champion

You can try this to start with.

index=os sourcetype=ps|dedup host
|join host [search  index=os sourcetype=ps rhnsd|stats latest(host) latest(_time) by host |eval lastSeen='latest(_time)'|fields host lastSeen]
|eval status=if(lastSeen<(_time - 300), "late","recent")
|table host status

This will produce a table of all of your hosts which are reporting events for ps.
It then runs a join, to look for the last event where rhnsd was running.
If the time delta is more than 300 seconds, then this is considered 'late', otherwise it will report 'recent'

This is by no means perfect, as hosts which have recently retired may show as late, until events fall out of your time window. If this gives you the type of results you expect, then we can make it a bit more visual

0 Karma

Explorer

I tested this by stopping rhnsd on 3 servers. This looks fine. Now how can this be visualized properly.

0 Karma

Ultra Champion

If you really want a pie chart, you could simply update the search to:

index=os sourcetype=ps|dedup host
 |join host [search  index=os sourcetype=ps rhnsd|stats latest(host) latest(_time) by host |eval lastSeen='latest(_time)'|fields host lastSeen]
 |eval status=if(lastSeen<(_time - 300), "late","recent")
 |stats count by status

And set a pie vis,

Or, if you wanted to use a colour changing Single Value the following Simple XML would do the trick

    <row>
        <panel>
          <single>
            <search>
              <query> index=os sourcetype=ps|dedup host
 |join host [search  index=os sourcetype=ps rhnsd|stats latest(host) latest(_time) by host |eval lastSeen='latest(_time)'|fields host lastSeen]
 |eval status=if(lastSeen<(_time - 300), "late","recent")|search status=late
 |stats count
              <earliest>@d</earliest>
              <latest>now</latest>
              <sampleRatio>1</sampleRatio>
            </search>
            <option name="colorBy">value</option>
            <option name="colorMode">block</option>
            <option name="drilldown">none</option>
            <option name="numberPrecision">0</option>
            <option name="rangeColors">["0x65a637","0xd93f3c"]</option>
            <option name="rangeValues">[0]</option>
            <option name="showSparkline">1</option>
            <option name="showTrendIndicator">1</option>
            <option name="trendColorInterpretation">standard</option>
            <option name="trendDisplayMode">absolute</option>
            <option name="underLabel">Hosts Missing rhnsd</option>
            <option name="unitPosition">after</option>
            <option name="useColors">1</option>
            <option name="useThousandSeparators">1</option>
          </single>
        </panel>
      </row>
0 Karma

Explorer

This looks good. Only one thing. If it didn't even run in the most recent scan(may be it was stopped before even the first scan) then it wont show up in the chart.
Any way i can include the hosts where it was not running for a long time.
So may be in a pie chart it can be compared between - the total hosts,hosts on which ps contains rhnsd and hosts on which ps doesnt contain rhnsd?

0 Karma

Ultra Champion

The way the query works is to say:
look for any server which has ever (in this time frame) sent any ps events
for each of those servers look for hosts which have NOT sent an rhnsd message in the last 5 mins.

You can run the search over a longer period of time, and unless rhnsd has run in the last 5 mins it will show up as missing.

0 Karma

Ultra Champion

Hi - I added this post - If you find it useful, please upvote the answer, or add your own solution if you found another way!

https://answers.splunk.com/answers/606762/how-do-i-monitor-jbosstomcatapacheetc-and-raise-an.html

0 Karma