Splunk Search

Need help on splunk search to get process status when stopped by using PID

sekhar463
Path Finder

Hi All,

i am using below search to monitor a status of process based on PID and usage 

we have tried by stopping the service ,PID got changed

how we can determine when it stopped, when using below search not getting OLD PID in the table, which was showing latest

how can modify 

index=Test1 host="testserver" (source=ps COMMAND=*cybAgent*)
| stats latest(cpu_load_percent) as "CPU %", latest(PercentMemory) as "MEM %", latest(RSZ_KB) as "Resident Memory (KB)", latest(VSZ_KB) as "Virtual Memory (KB)",latest(PID) as "PID" ,latest(host) as "host" by COMMAND
| eval Process_Status = case(isnotnull('CPU %') AND isnotnull('MEM %'), "Running", isnull('CPU %') AND isnull('MEM %'), "Not Running", 1=1, "Unknown")
| table host,"CPU %", "MEM %", "Resident Memory (KB)", "Virtual Memory (KB)", Process_Status,COMMAND,PID
| eval Process_Status = coalesce(Process_Status, "Unknown")
| rename "CPU %" as "CPU %", "MEM %" as "MEM %"
| fillnull value="N/A"

Labels (1)
0 Karma

sekhar463
Path Finder

hi i have some process data in the source=ps 

want to get status if down.

i have observed it can be done using PID.

for example one command ./cybAgent.bin  which is running with some PID and after stopped it and started it PID will change and new PID is showing in the splunk data

but how we can determine when its stopped

below are the events before and after restarted

10/20/23
12:07:31.000 PM
root    6043    0.7    1.2    6769248    201544    ?    Ssl    Oct06    158:59    ./cybAgent.bin    -a
host = testhost
10/20/23
12:07:02.000 PM
root    6043    0.7    1.2    6769248    201544    ?    Ssl    Oct06    158:59    ./cybAgent.bin    -a
host = testhost
 
 
 
 
 


10/20/23
3:50:30.000 PM
root 20414 1.0 1.1 6766164 189864 ? Ssl 01:41 2:12 ./cybAgent.bin -a
host = testhost
10/20/23
3:50:03.000 PM
root 20414 0.9 1.1 6766164 189864 ? Ssl 01:41 2:11 ./cybAgent.bin -a
host = testhost

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Ok. If you want to find the moment in which the PID changed, you have to carry it over to the next event (otherwise Splunk doesn't have any notion of any relationship between separate events) using the "autoregress" command or - in a more universal manner -using streamstats

| streamstats current=f last(PID) as lastPID by COMMAND

This way you can see when lastPID for a given command is different than PID (mind you, Splunk by default sorts in reverse chronological order so this way you'll find the latest event before the restart; you can tweak this solution with sorting to find the first one after the restart).

As a side note, don't use wildcards at the betinning of the search term unless you absolutely must.

0 Karma

sekhar463
Path Finder

Hai can you help with full query to get status if down

 

index=test_index host="test" (source=ps COMMAND=*cybAgent* OR COMMAND=*event_demon* OR COMMAND=*as_server*)
| streamstats current=f last(PID) as lastPID by COMMAND
0 Karma

yuanliu
SplunkTrust
SplunkTrust

The mock data is helpful. (Note the two entries have no difference except timestamp.)  But always verbalize your thought process of how you derive changed/stopped from this data.  Asking volunteers to reverse engineer (aka read mind) complex code discourage people from offering help.  Because there are always more wrong speculations than correct one, mind reading is usually a waste of time.

If I must try, I see that you are trying to determine "not running" state from isnull('CPU %') AND isnull('MEM %').  I do not think this is possible because if a process is not running, the command will not be in any event.  Your verbal descriptions give me the vague sense that you don't really expect Windows to give you an explicit event about something not running.  Instead, you are expecting to detect a period of "stoppage" between a previously running process and a latter running process (with a different PID).  Is this correct?  In that case, using latest function on everything will not achieve that.

Meanwhile, if all you want to see is whether a specific command (such as cybAgent.bin) is running in the latest period during which any Windows events is available, you CAN use other events as reference point.  But you will have to give up the filter COMMAND=*cybAgent* so other events can come through.  For example, if you know a specific command (I call it a "heartbeat")  that always runs, you can have a filter like (COMMAND IN (*cybAgent*, <heartbeat>), then use the heartbeat event to infer a process' "not running."  Is this your use case? (Theoretically you can pour all process events through and use all of them as heartbeat.  Is that viable?)

Alternatively, if you don't have/want a heartbeat event(s), but you know for certain that process events always come in at predetermined time intervals (e.g., every minute), you can use the interval as reference to infer whether a command is running.  Is this the case?

In addition, you did not describe your desired output.  The sample code suggests that in addition to status (including indication of stoppage), you also want some metric on CPU and memory.  If your use case is the former, i.e., detect stoppage by detecting PID changes, you will need a stats function to calculate that metric.  Is that avg?

0 Karma

yuanliu
SplunkTrust
SplunkTrust

The only way volunteers can help you concretely is for you to post sample or mock data (anonymize as needed) in text, illustrate desired results (in text), then explain the logic between illustrated data and results.  Forget Splunk.  What would you be looking for in the data you illustrate to determine status by PID?  What does "status of process based on PID" even mean? Do you mean listing status of process grouped by PID? (Splunk and many data query languages call this group-by.)

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...