Hello all,
We have a Splunk alert that searches for high temperature events on Juniper routers, it's a very straight forward search:
index=main CHASSISD_FRU_HIGH_TEMP_CONDITION OR CHASSISD_OVER_TEMP_SHUTDOWN_TIME OR CHASSISD_OVER_TEMP_CONDITION OR CHASSISD_TEMP_HOT_NOTICE OR CHASSISD_FPC_OPTICS_HOT_NOTICE OR CHASSISD_HIGH_TEMP_CONDITION OR (CHASSISD "Temperature back to normal") NOT UI_CMDLINE_READ_LINE
I'd like this Splunk alert to ignore temperature alarm events on the host router4-utah when FPC 11 = FPC: MPC5E 3D 24XGE+6XLGE @ 11/*/* is running hot, the events always come in the following order within 25 seconds of each other:
The alarm trigger events:
Sep 27 05:26:00 re0.router4-utah chassisd[7726]: CHASSISD_BLOWERS_SPEED_FULL: Fans and impellers being set to full speed [system warm]
Sep 27 05:26:00 re0.router4-utah alarmd[7895]: Alarm set: Temp sensor color=YELLOW, class=CHASSIS, reason=Temperature Warm
Sep 27 05:26:00 re0.router4-utah craftd[7730]: Minor alarm set, Temperature Warm
Sep 27 05:26:00 re0.router4-utah chassisd[7726]: CHASSISD_HIGH_TEMP_CONDITION: Chassis temperature over 60 degrees C (but no fan/impeller failure detected)
Sep 27 05:26:02 re0.router4-utah chassisd[7726]: CHASSISD_SNMP_TRAP6: SNMP trap generated: Over Temperature! (jnxContentsContainerIndex 7, jnxContentsL1Index 12, jnxContentsL2Index 0, jnxContentsL3Index 0, jnxContentsDescr FPC: MPC5E 3D 24XGE+6XLGE @ 11/*/*, jnxOperatingTemp 91)
The alarm clear events:
Sep 27 05:26:21 re0.router4-utah alarmd[7895]: Alarm cleared: Temp sensor color=YELLOW, class=CHASSIS, reason=Temperature Warm
Sep 27 05:26:21 re0.router4-utah craftd[7730]: Minor alarm cleared, Temperature Warm
The goal is to keep the normal temperature alert running as it always has, but somehow ignore the host router4-utah when it triggers and clears temperature alarms on FPC11. I think the easiest way to say this is any temp alarm that triggers and clears on router4-utah that is surrounded within 25 seconds of this line:
Sep 27 05:26:02 re0.router4-utah chassisd[7726]: CHASSISD_SNMP_TRAP6: SNMP trap generated: Over Temperature! (jnxContentsContainerIndex 7, jnxContentsL1Index 12, jnxContentsL2Index 0, jnxContentsL3Index 0, jnxContentsDescr FPC: MPC5E 3D 24XGE+6XLGE @ 11/*/*, jnxOperatingTemp 91)
Any assistance one can provide is much appreciated! Thanks.
Looks like a good use case for transaction. (You must have search window > 25s in this case.)
index=main (host=re0.router4-utah "Alarm cleared: Temp sensor" color=YELLOW, class=CHASSIS, "reason=Temperature Warm") OR CHASSISD_FRU_HIGH_TEMP_CONDITION OR CHASSISD_OVER_TEMP_SHUTDOWN_TIME OR CHASSISD_OVER_TEMP_CONDITION OR CHASSISD_TEMP_HOT_NOTICE OR CHASSISD_FPC_OPTICS_HOT_NOTICE OR CHASSISD_HIGH_TEMP_CONDITION OR (CHASSISD "Temperature back to normal") NOT UI_CMDLINE_READ_LINE
| transaction host maxspan=25s startswith="CHASSISD_HIGH_TEMP_CONDITION" endswith="Alarm cleared: Temp sensor"
| where closed_txn == 0
Hope this helps.
Looks like a good use case for transaction. (You must have search window > 25s in this case.)
index=main (host=re0.router4-utah "Alarm cleared: Temp sensor" color=YELLOW, class=CHASSIS, "reason=Temperature Warm") OR CHASSISD_FRU_HIGH_TEMP_CONDITION OR CHASSISD_OVER_TEMP_SHUTDOWN_TIME OR CHASSISD_OVER_TEMP_CONDITION OR CHASSISD_TEMP_HOT_NOTICE OR CHASSISD_FPC_OPTICS_HOT_NOTICE OR CHASSISD_HIGH_TEMP_CONDITION OR (CHASSISD "Temperature back to normal") NOT UI_CMDLINE_READ_LINE
| transaction host maxspan=25s startswith="CHASSISD_HIGH_TEMP_CONDITION" endswith="Alarm cleared: Temp sensor"
| where closed_txn == 0
Hope this helps.
Yes, this is definitely useful, thank you for the help!