Deployment Architecture

UNIX Process Monitoring Template

clyde772
Communicator

Anybody in Splunk answers tried to register process to monitor for Unix systems?

I am trying to set-up Splunk to monitor running status of a process. When a process dies, meaning no longer exists, then I want splunk to generate an event that the process no longer is running in a system.

It sounds pretty simple, but my delima is that when I search for a process, for example httpd. When the httpd is running then, it would give me a result to verify that the process is there, but when the process no longer exists, splunk will fail searching for that process event. Based on the result that failed to search a process event, how can I make that situation into an event?

I would appreciate it if anyone has simular monitoring template that mointors certain proceses' status.

Tags (1)

Genti
Splunk Employee
Splunk Employee

Do you know about the Unix App?
If you do some research and have a look at this app you will notice that it grabs process information using the TOP command around every 60 seconds. Using a search then you can have splunk notify you if this process is not running.

Something like:

index="os" source="top" myprocess earliest=-2min latest=-1min

and have it run every every minute. Then you can save this search with the condition that if it returns less then 1 result, it should email you an alert.

So if myprocess is dead, then splunk should notify you of it, unless of course its a stale process that shows up in top even when it is not running...

southeringtonp
Motivator

The default Unix app already contains a scripted input that runs ps. That should get you most of the way, but there are several different ways you might construct the search.

Here's one approach, which will generate a new field is_running:

index=os sourcetype=ps
| head 1
| eval is_running=if(match(_raw, "\shttpd\b"), 1, 0)

head 1 will retrieve only the latest polling cycle, giving you the "current" status. If you want to do things like charting status over time, leave that bit out. Using 1 or 0 makes charting easier, but you can also replace them with text values like "Running" or "Not Running", etc.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...