I have something like the following, where I have a message producer and consumer.
I am using ActiveMQ for messaging.
Sometimes I notice that consumer didn't get messages and I'm logging this way:
Producer code: log.info("Status=Produced, TransactionId=123"); Consumer.code: log.info("Status=Consumed, TransactionId=123");
I also have a Dead Letter queue consumer, which logs something like:
DLQConsumer: log.info("Status=Discarded, TransactionId=123");
The whole Producer/Consumer flow is Async.
I need Splunk to alert me when it sees a transaction, that is not processed by Consumer.
How do I write a Splunk search to alert me for these?
In a nutshell what I would like to get reported is that:
All messages produced should be consumed, if not, then I need to get alerted with TransactionId.
Also I don't want to deal with a situation where a message was just produced and not yet consumed, still Splunk reporting it to me.
Maybe I can set the time range as current time - 15 minutes to current time - 1 minute to avoid a situation where a message was just produced and not yet consumed.
... | eval isnotibleevent=if(condition,"t",NULL) | transaction somefield | where isnull(isnotable_event)
I don't know why you mentioned the DLQ but something like this should work for you:
... | reverse | streamstats current=t count(eval(Status="Produced")) AS sessionID by TransactionId | stats earliest(_time) AS startTime latest(_time) AS endTime count by sessionID host | where count=1 | eval waitingSeconds = now() - _time | where waitingSeconds > (15*60)
The reason why I mentioned DLQ is that I wanted a report telling me how many messages were not processed [on the Consumer layer]. Ideally if I produce X, then I want to consume all X. Irrespective of where the messages go (either to DLQ or not consumed), I need a report that clearly tells me X were produced and X - n were consumed and the report should just have "n" records along with transactionId's.
Yesterday I ran into an issue where Producer dropped off messages and I didn't see any activity on the Consumer side. Messages were processed by DLQConsumer after a while as Consumer had some issue (likely the connectivity to ActiveMQ was broken). Though a simple restart resolved the issue, I had no clue as to know why no messages were processed by Consumer. The issue lasted for a few hours. I would have reacted if I had a splunk alert for a situation like this and that's why I posted this question yesterday.
That was my point: for the purposes of your question, DLQ is irrelevant. My answer should suffice as-is.