Hi All,
i am using below search to monitor a status of process based on PID and usage
we have tried by stopping the service ,PID got changed
how we can determine when it stopped, when using below search not getting OLD PID in the table, which was showing latest
how can modify
index=Test1 host="testserver" (source=ps COMMAND=*cybAgent*)
| stats latest(cpu_load_percent) as "CPU %", latest(PercentMemory) as "MEM %", latest(RSZ_KB) as "Resident Memory (KB)", latest(VSZ_KB) as "Virtual Memory (KB)",latest(PID) as "PID" ,latest(host) as "host" by COMMAND
| eval Process_Status = case(isnotnull('CPU %') AND isnotnull('MEM %'), "Running", isnull('CPU %') AND isnull('MEM %'), "Not Running", 1=1, "Unknown")
| table host,"CPU %", "MEM %", "Resident Memory (KB)", "Virtual Memory (KB)", Process_Status,COMMAND,PID
| eval Process_Status = coalesce(Process_Status, "Unknown")
| rename "CPU %" as "CPU %", "MEM %" as "MEM %"
| fillnull value="N/A"
hi i have some process data in the source=ps
want to get status if down.
i have observed it can be done using PID.
for example one command ./cybAgent.bin which is running with some PID and after stopped it and started it PID will change and new PID is showing in the splunk data
but how we can determine when its stopped
below are the events before and after restarted
10/20/23
3:50:30.000 PM
root 20414 1.0 1.1 6766164 189864 ? Ssl 01:41 2:12 ./cybAgent.bin -a
host = testhost
10/20/23
3:50:03.000 PM
root 20414 0.9 1.1 6766164 189864 ? Ssl 01:41 2:11 ./cybAgent.bin -a
host = testhost
Ok. If you want to find the moment in which the PID changed, you have to carry it over to the next event (otherwise Splunk doesn't have any notion of any relationship between separate events) using the "autoregress" command or - in a more universal manner -using streamstats
| streamstats current=f last(PID) as lastPID by COMMAND
This way you can see when lastPID for a given command is different than PID (mind you, Splunk by default sorts in reverse chronological order so this way you'll find the latest event before the restart; you can tweak this solution with sorting to find the first one after the restart).
As a side note, don't use wildcards at the betinning of the search term unless you absolutely must.
Hai can you help with full query to get status if down
index=test_index host="test" (source=ps COMMAND=*cybAgent* OR COMMAND=*event_demon* OR COMMAND=*as_server*)
| streamstats current=f last(PID) as lastPID by COMMAND
The mock data is helpful. (Note the two entries have no difference except timestamp.) But always verbalize your thought process of how you derive changed/stopped from this data. Asking volunteers to reverse engineer (aka read mind) complex code discourage people from offering help. Because there are always more wrong speculations than correct one, mind reading is usually a waste of time.
If I must try, I see that you are trying to determine "not running" state from isnull('CPU %') AND isnull('MEM %'). I do not think this is possible because if a process is not running, the command will not be in any event. Your verbal descriptions give me the vague sense that you don't really expect Windows to give you an explicit event about something not running. Instead, you are expecting to detect a period of "stoppage" between a previously running process and a latter running process (with a different PID). Is this correct? In that case, using latest function on everything will not achieve that.
Meanwhile, if all you want to see is whether a specific command (such as cybAgent.bin) is running in the latest period during which any Windows events is available, you CAN use other events as reference point. But you will have to give up the filter COMMAND=*cybAgent* so other events can come through. For example, if you know a specific command (I call it a "heartbeat") that always runs, you can have a filter like (COMMAND IN (*cybAgent*, <heartbeat>), then use the heartbeat event to infer a process' "not running." Is this your use case? (Theoretically you can pour all process events through and use all of them as heartbeat. Is that viable?)
Alternatively, if you don't have/want a heartbeat event(s), but you know for certain that process events always come in at predetermined time intervals (e.g., every minute), you can use the interval as reference to infer whether a command is running. Is this the case?
In addition, you did not describe your desired output. The sample code suggests that in addition to status (including indication of stoppage), you also want some metric on CPU and memory. If your use case is the former, i.e., detect stoppage by detecting PID changes, you will need a stats function to calculate that metric. Is that avg?
The only way volunteers can help you concretely is for you to post sample or mock data (anonymize as needed) in text, illustrate desired results (in text), then explain the logic between illustrated data and results. Forget Splunk. What would you be looking for in the data you illustrate to determine status by PID? What does "status of process based on PID" even mean? Do you mean listing status of process grouped by PID? (Splunk and many data query languages call this group-by.)