I've been searching splunk answers all morning trying to get this one. It seems simple enough, but I can't lick it and I'm just spinning my wheels.
I'm trying to get a percentage uptime based on the TA_nix ps sourcetype. The rub is that it's for a two node cluster, so when one host is down and the other one is still up then the cluster as a whole is still up, and that's what they want..
Also the search I am running is sometimes providing results greater than 100% even when I break it down by Node1 and Node2. I'm counting on ps to poll 1 result per minute for this process.
Here's my search, and a sample set of results so you can see what I'm working with.
index="os" sourcetype="ps" USER=processuser COMMAND="commandiwanttocheck" (host=homehostsh OR host=someotherhosts)
| lookup serverinfo_lookup hostname AS host OUTPUTNEW ServerType ClusterNode
| stats count(COMMAND) as TotalResponses max(_time) as last_time min(_time) as first_time by ClusterNode ServerType
| eval minutes=((last_time-first_time)/60)
| eval Percent=round(((TotalResponses)/minutes)*100,2)
The result of the search is this. I've still got my "working" fields in there