Need help on splunk search to get process status w...

sekhar463 · ‎10-20-2023

Hi All,

i am using below search to monitor a status of process based on PID and usage

we have tried by stopping the service ,PID got changed

how we can determine when it stopped, when using below search not getting OLD PID in the table, which was showing latest

how can modify

index=Test1 host="testserver" (source=ps COMMAND=*cybAgent*)
| stats latest(cpu_load_percent) as "CPU %", latest(PercentMemory) as "MEM %", latest(RSZ_KB) as "Resident Memory (KB)", latest(VSZ_KB) as "Virtual Memory (KB)",latest(PID) as "PID" ,latest(host) as "host" by COMMAND
| eval Process_Status = case(isnotnull('CPU %') AND isnotnull('MEM %'), "Running", isnull('CPU %') AND isnull('MEM %'), "Not Running", 1=1, "Unknown")
| table host,"CPU %", "MEM %", "Resident Memory (KB)", "Virtual Memory (KB)", Process_Status,COMMAND,PID
| eval Process_Status = coalesce(Process_Status, "Unknown")
| rename "CPU %" as "CPU %", "MEM %" as "MEM %"
| fillnull value="N/A"

sekhar463 · ‎10-20-2023

hi i have some process data in the source=ps

want to get status if down.

i have observed it can be done using PID.

for example one command ./cybAgent.bin which is running with some PID and after stopped it and started it PID will change and new PID is showing in the splunk data

but how we can determine when its stopped

below are the events before and after restarted

10/20/23

12:07:31.000 PM

root 6043 0.7 1.2 6769248 201544 ? Ssl Oct06 158:59 ./cybAgent.bin -a

host = testhost

10/20/23

12:07:02.000 PM

root 6043 0.7 1.2 6769248 201544 ? Ssl Oct06 158:59 ./cybAgent.bin -a

host = testhost

10/20/23
3:50:30.000 PM
root 20414 1.0 1.1 6766164 189864 ? Ssl 01:41 2:12 ./cybAgent.bin -a
host = testhost
10/20/23
3:50:03.000 PM
root 20414 0.9 1.1 6766164 189864 ? Ssl 01:41 2:11 ./cybAgent.bin -a
host = testhost

PickleRick · ‎10-20-2023

Ok. If you want to find the moment in which the PID changed, you have to carry it over to the next event (otherwise Splunk doesn't have any notion of any relationship between separate events) using the "autoregress" command or - in a more universal manner -using streamstats

| streamstats current=f last(PID) as lastPID by COMMAND

This way you can see when lastPID for a given command is different than PID (mind you, Splunk by default sorts in reverse chronological order so this way you'll find the latest event before the restart; you can tweak this solution with sorting to find the first one after the restart).

As a side note, don't use wildcards at the betinning of the search term unless you absolutely must.

sekhar463 · ‎10-23-2023

Hai can you help with full query to get status if down

index=test_index host="test" (source=ps COMMAND=*cybAgent* OR COMMAND=*event_demon* OR COMMAND=*as_server*)
| streamstats current=f last(PID) as lastPID by COMMAND

yuanliu · ‎10-20-2023

The mock data is helpful. (Note the two entries have no difference except timestamp.) But always verbalize your thought process of how you derive changed/stopped from this data. Asking volunteers to reverse engineer (aka read mind) complex code discourage people from offering help. Because there are always more wrong speculations than correct one, mind reading is usually a waste of time.

If I must try, I see that you are trying to determine "not running" state from isnull('CPU %') AND isnull('MEM %'). I do not think this is possible because if a process is not running, the command will not be in any event. Your verbal descriptions give me the vague sense that you don't really expect Windows to give you an explicit event about something not running. Instead, you are expecting to detect a period of "stoppage" between a previously running process and a latter running process (with a different PID). Is this correct? In that case, using latest function on everything will not achieve that.

Meanwhile, if all you want to see is whether a specific command (such as cybAgent.bin) is running in the latest period during which any Windows events is available, you CAN use other events as reference point. But you will have to give up the filter COMMAND=*cybAgent* so other events can come through. For example, if you know a specific command (I call it a "heartbeat") that always runs, you can have a filter like (COMMAND IN (*cybAgent*, <heartbeat>), then use the heartbeat event to infer a process' "not running." Is this your use case? (Theoretically you can pour all process events through and use all of them as heartbeat. Is that viable?)

Alternatively, if you don't have/want a heartbeat event(s), but you know for certain that process events always come in at predetermined time intervals (e.g., every minute), you can use the interval as reference to infer whether a command is running. Is this the case?

In addition, you did not describe your desired output. The sample code suggests that in addition to status (including indication of stoppage), you also want some metric on CPU and memory. If your use case is the former, i.e., detect stoppage by detecting PID changes, you will need a stats function to calculate that metric. Is that avg?

yuanliu · ‎10-20-2023

The only way volunteers can help you concretely is for you to post sample or mock data (anonymize as needed) in text, illustrate desired results (in text), then explain the logic between illustrated data and results. Forget Splunk. What would you be looking for in the data you illustrate to determine status by PID? What does "status of process based on PID" even mean? Do you mean listing status of process grouped by PID? (Splunk and many data query languages call this group-by.)

Need help on splunk search to get process status when stopped by using PID

table

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Are you a member of the Splunk Community?

Need help on splunk search to get process status when stopped by using PID

table

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor