@tread_splunk @somesoni2 I realized having a sample data set would be helpful. For that reason, I am attaching a sample data set and an explanation below: We have 4 Pods. Each of these pods receive 3-5 messages in Every Minute. Now these messages are NOT evenly distributed. Meaning, it's not like time: 0s, 15s,30s,45s etc. We have noticed, one or two of these pods goes in Zombie state. Meaning, for say 3 minutes, these Pods are not writing this event. Overall objective is to find the query > create a dashboard panel > generate simple alerts when we detect these Zombiness. Now the explanation of the dataset. Below is the top level query: index namespace message="incoming events" pod=* To detect the bad actors with human eyes, I am adding timespan so I can detect the anomaly. The sample data that I am providing comes from the below query (and not the starter query): index namespace message="incoming events" pod=* | timechart count by pod span=1m Since data from raw data set is in JSON objects, I am not attaching here. But I can of course provide the true raw data set if it helps in our investigation. From this dataset, this is the bad Pod: Bad Pod: pod-a 2021-10-14T21:01:30.000+0000 event count 3 2021-10-14T21:02:30.000+0000 event count 0 ............. ........... 02021-10-14T21:05:00.000+0000 event count 0 02021-10-14T21:05:30.000+0000 event count 4 Duration of being in bad state: 3 mins I am trying to get the query that I can utilize in creating a dashboard (and alert), where me or any of my team mate can simply run the query and detect: 1. Bad Pod 2. Duration in bad state ( time > 1m). I appreciate your help and Thank you. And as I said, I can always provide the true raw data (from the starter query) if it helps in this investigation. CSV File
... View more