I am monitoring the percent usage of my CPU and RAM by entering the following in the search:
(index=* host=* sourcetype"Perfmon:Memory" collection=Memory object=Memory counter="% Committed Bytes In Use") OR (index=* host=* sourcetype="Perfmon:CPU" counter="% Processor Time") | eval "Indexed Time"=strftime(_time, "%Y/%d/%m %H:%M") | eval "Computer Host"=host | eval "Event Type"=source | eval Object=object | eval "Percent Usage"=round(Value, 2)."%" | table "Indexed Time", "Computer Host", "Event Type", Object, "Percent Usage"
It all comes out well, however I am only trying to make the events in which the 'Value' variable (both CPU and RAM) is above the integer, 50 (greater than 50%). I have tried putting the following in along with some other variations of it too:
| where Value > 50
No errors are thrown on searching, but it pulls up zero results (which I have checked, and there are values greater than 50).
My end goal is to make this an alert. Normally I would have these in two separate searches/alerts, but my boss wants it in one (hence the 'OR'). I have been looking through Splunk Docs and Splunk Answers, but I'm only getting information on using the "where" command. Any further help -- even if it's a helpful link -- would be greatly appreciated. Thank you.
@drizzo, Have you tried the following (You need to provide span based on how frequently you feed data from forwarder, for example span=5m
:
(index="xyz" host="abc" sourcetype"Perfmon:Memory" collection=Memory object=Memory counter="% Committed Bytes In Use") OR (index="xyz" host="abc" sourcetype="Perfmon:CPU" counter="% Processor Time")
| where Value>50
| timechart span=<YourForwarderSpan> list(counter) as Counter list(Value) as Value values(host) as "Computer Host" values(source) as "Event Type" values(object) as Object
| search Object="Memory" AND Memory="CPU"
| eval Value=round(Value,2)."%"
| rename Value as "Percent Usage"
| table _time "Computer Host" "Event Type" Object Counter "Percent Usage"
Also give the following a try:
(index="xyz" host="abc" sourcetype"Perfmon:Memory" collection=Memory object=Memory counter="% Committed Bytes In Use") OR (index="xyz" host="abc" sourcetype="Perfmon:CPU" counter="% Processor Time") (Value="5*" AND Value!="5.*") OR (Value="6*" AND Value!="6.*") OR (Value="7*" AND Value!="7.*") OR (Value="8*" AND Value!="8.*") OR (Value="9*" AND Value!="9.*") OR (Value="100")
| timechart span=<YourForwarderSpan> list(counter) as Counter list(Value) as Value values(host) as "Computer Host" values(source) as "Event Type" values(object) as Object
| search Object="Memory" AND Memory="CPU"
| eval Value=round(Value,2)."%"
| rename Value as "Percent Usage"
| table _time "Computer Host" "Event Type" Object Counter "Percent Usage"
In case you are planning to setup alert you can try the following query which fetches only the latest CPU and Memory performance counters from hosts
index="xyz" host="abc" sourcetype"Perfmon:Memory" collection=Memory object=Memory counter="% Committed Bytes In Use"
| head 1
| append [search index="xyz" host="abc" sourcetype="Perfmon:CPU" counter="% Processor Time" | head 1 ]
| timechart list(counter) as Counter list(Value) as Value values(host) as "Computer Host" values(source) as "Event Type" values(object) as Object
| search Object="Memory" AND Memory="CPU"
| eval Value=round(Value,2)."%"
| rename Value as "Percent Usage"
| table _time "Computer Host" "Event Type" Object Counter "Percent Usage"
First, don't spend time "prettying up" the variable names before you've gotten the logic working. until everything is coming safely out the end, you're just risking adding to the confusion.
Testing suggestions -
Set the host=
for a host you know has some issues, and then try each of these chunks of code, one at a time, and see if they work.
index=* host="yourproblemchildhost"
(sourcetype"Perfmon:Memory" collection=Memory object=Memory counter="% Committed Bytes In Use") OR
(sourcetype="Perfmon:CPU" counter="% Processor Time")
| where Value>=50
| rename COMMENT as "The above should get you any events where CPU is above 50 or Memory is above 50 for a host."
| rename COMMENT as "Use the below to limit your results for testing - remove it when the search is working."
| head 20
| rename COMMENT as "Now we rename them and check for any 5m _time period that a host has both."
| eval CPU=if(sourcetype="Perfmon:CPU",Value,null())
| eval Memory=if(sourcetype="Perfmon:Memory",Value,null())
| bin _time span=5m
| stats max(CPU) as CPU max(Memory) as Memory by _time host
| where CPU>=50 AND Memory>=50
| rename COMMENT as "Now we can pretty them up."
| eval "CPU Percent Usage"=round(CPU, 2)."%"
| eval "Memory Percent Usage"=round(Memory, 2)."%"
| eval "Indexed Time"=strftime(_time, "%Y/%d/%m %H:%M")
| eval "Computer Host"= host
| table "Indexed Time", "Computer Host", "CPU Percent Usage", "Memory Percent Usage"
@drizzo, Have you tried the following (You need to provide span based on how frequently you feed data from forwarder, for example span=5m
:
(index="xyz" host="abc" sourcetype"Perfmon:Memory" collection=Memory object=Memory counter="% Committed Bytes In Use") OR (index="xyz" host="abc" sourcetype="Perfmon:CPU" counter="% Processor Time")
| where Value>50
| timechart span=<YourForwarderSpan> list(counter) as Counter list(Value) as Value values(host) as "Computer Host" values(source) as "Event Type" values(object) as Object
| search Object="Memory" AND Memory="CPU"
| eval Value=round(Value,2)."%"
| rename Value as "Percent Usage"
| table _time "Computer Host" "Event Type" Object Counter "Percent Usage"
Also give the following a try:
(index="xyz" host="abc" sourcetype"Perfmon:Memory" collection=Memory object=Memory counter="% Committed Bytes In Use") OR (index="xyz" host="abc" sourcetype="Perfmon:CPU" counter="% Processor Time") (Value="5*" AND Value!="5.*") OR (Value="6*" AND Value!="6.*") OR (Value="7*" AND Value!="7.*") OR (Value="8*" AND Value!="8.*") OR (Value="9*" AND Value!="9.*") OR (Value="100")
| timechart span=<YourForwarderSpan> list(counter) as Counter list(Value) as Value values(host) as "Computer Host" values(source) as "Event Type" values(object) as Object
| search Object="Memory" AND Memory="CPU"
| eval Value=round(Value,2)."%"
| rename Value as "Percent Usage"
| table _time "Computer Host" "Event Type" Object Counter "Percent Usage"
In case you are planning to setup alert you can try the following query which fetches only the latest CPU and Memory performance counters from hosts
index="xyz" host="abc" sourcetype"Perfmon:Memory" collection=Memory object=Memory counter="% Committed Bytes In Use"
| head 1
| append [search index="xyz" host="abc" sourcetype="Perfmon:CPU" counter="% Processor Time" | head 1 ]
| timechart list(counter) as Counter list(Value) as Value values(host) as "Computer Host" values(source) as "Event Type" values(object) as Object
| search Object="Memory" AND Memory="CPU"
| eval Value=round(Value,2)."%"
| rename Value as "Percent Usage"
| table _time "Computer Host" "Event Type" Object Counter "Percent Usage"
This actually ended up working! I'm amazed with your response -- very detailed. But it is not letting me confirm yours as an answer.
@drizzo, glad it worked. I have converted my comment to answer. You can go ahead and accept to mark this as answered.
Thank you!