Deployment Architecture

How to monitor a process (rhnsd) on all Linux servers via a pie chart?

pradipto
Explorer

How to monitor a process (rhnsd) on all Linux servers via a pie chart?

For example, if on any server the rhnsd is running I get green on the pie chart, and if on any server it's stopped, it will show up in red.

Labels (2)
0 Karma

stotta11
New Member

Hi -

I need to create an alert where if a process is not running in a linux server , then it should send out an alert :

Below query is giving me correct results of all the processess running in a server :

index="index-name" source=ps host="hostname*" process="process_name"
| dedup host process
| join host [search index="index-name" source=ps host="hostname*" process="process_name"
| stats latest(host) latest(_time) by host |eval lastSeen='latest(_time)'|fields host lastSeen]
|eval status=if(lastSeen<(_time - 300), "not running","running")
|table host status process

Example Output :

Host : hostname
Status : running
process : process_name

But i need to send an alert if the status is not running

Could anybody help me with it

0 Karma

nickhills
Ultra Champion

I am not sure a pie chart is the best visualisation for what you are asking.
Instead, here is a way to do something similar with Single values.

if you have the Splunk_TA_nix deployed, you will want to enable ps.sh
In your inputs make sure you have

[script://./bin/ps.sh]
disabled = false

You will then be collecting events from your forwarders, and can run queries like this:

index=os sourcetype=ps rhnsd
You will get results for every system which has the process running.

Now here is the tricky bit - do you know how many hosts should be running the process?
-or-
Do you just want to see if a host which was previously running it has stopped?

If you know there should be 10 hosts:

index=os sourcetype=ps rhnsd|dedup host|stats count as runningCount|eval rhnsdMissing=(10-runningCount)| table rhnsdMissing

you can then colour code the single value pane as appropriate.

If my comment helps, please give it a thumbs up!
0 Karma

pradipto
Explorer

Its difficult to track the exact number of hosts as hosts gets provisioned and decommissioned continuously.
I want to track if rhnsd running on any host has stopped or not, because its expected to run on all hosts.

0 Karma

nickhills
Ultra Champion

You can try this to start with.

index=os sourcetype=ps|dedup host
|join host [search  index=os sourcetype=ps rhnsd|stats latest(host) latest(_time) by host |eval lastSeen='latest(_time)'|fields host lastSeen]
|eval status=if(lastSeen<(_time - 300), "late","recent")
|table host status

This will produce a table of all of your hosts which are reporting events for ps.
It then runs a join, to look for the last event where rhnsd was running.
If the time delta is more than 300 seconds, then this is considered 'late', otherwise it will report 'recent'

This is by no means perfect, as hosts which have recently retired may show as late, until events fall out of your time window. If this gives you the type of results you expect, then we can make it a bit more visual

If my comment helps, please give it a thumbs up!

pradipto
Explorer

I tested this by stopping rhnsd on 3 servers. This looks fine. Now how can this be visualized properly.

0 Karma

nickhills
Ultra Champion

If you really want a pie chart, you could simply update the search to:

index=os sourcetype=ps|dedup host
 |join host [search  index=os sourcetype=ps rhnsd|stats latest(host) latest(_time) by host |eval lastSeen='latest(_time)'|fields host lastSeen]
 |eval status=if(lastSeen<(_time - 300), "late","recent")
 |stats count by status

And set a pie vis,

Or, if you wanted to use a colour changing Single Value the following Simple XML would do the trick

    <row>
        <panel>
          <single>
            <search>
              <query> index=os sourcetype=ps|dedup host
 |join host [search  index=os sourcetype=ps rhnsd|stats latest(host) latest(_time) by host |eval lastSeen='latest(_time)'|fields host lastSeen]
 |eval status=if(lastSeen<(_time - 300), "late","recent")|search status=late
 |stats count
              <earliest>@d</earliest>
              <latest>now</latest>
              <sampleRatio>1</sampleRatio>
            </search>
            <option name="colorBy">value</option>
            <option name="colorMode">block</option>
            <option name="drilldown">none</option>
            <option name="numberPrecision">0</option>
            <option name="rangeColors">["0x65a637","0xd93f3c"]</option>
            <option name="rangeValues">[0]</option>
            <option name="showSparkline">1</option>
            <option name="showTrendIndicator">1</option>
            <option name="trendColorInterpretation">standard</option>
            <option name="trendDisplayMode">absolute</option>
            <option name="underLabel">Hosts Missing rhnsd</option>
            <option name="unitPosition">after</option>
            <option name="useColors">1</option>
            <option name="useThousandSeparators">1</option>
          </single>
        </panel>
      </row>
If my comment helps, please give it a thumbs up!
0 Karma

pradipto
Explorer

This looks good. Only one thing. If it didn't even run in the most recent scan(may be it was stopped before even the first scan) then it wont show up in the chart.
Any way i can include the hosts where it was not running for a long time.
So may be in a pie chart it can be compared between - the total hosts,hosts on which ps contains rhnsd and hosts on which ps doesnt contain rhnsd?

0 Karma

nickhills
Ultra Champion

The way the query works is to say:
look for any server which has ever (in this time frame) sent any ps events
for each of those servers look for hosts which have NOT sent an rhnsd message in the last 5 mins.

You can run the search over a longer period of time, and unless rhnsd has run in the last 5 mins it will show up as missing.

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

Hi - I added this post - If you find it useful, please upvote the answer, or add your own solution if you found another way!

https://answers.splunk.com/answers/606762/how-do-i-monitor-jbosstomcatapacheetc-and-raise-an.html

If my comment helps, please give it a thumbs up!
0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...