Alerting

Alerting for Conditional Event Existance

jnovino
New Member

We current log two different messages for the start and completion of a workflow we would like to create an alert that tells us when we see a start log but don't see a corresponding competition log within a threshold i.e. 1 hour:

2018-06-19 03:59:59.8592|INFO|Metrics|corId=37c15fccc6a54b38a5fbbb00095f30d5;df93bc8414a34e308ca26f56ea7e911b;orders-547400b38cff40a0aa25922f7a62cd18;547400b38cff40a0aa25922f7a62cd18|messageId=3b7a948448fe4ce89445c31ddd35deb4|CustomEvent: collection=GambitV2|group=JokerWorkflows|category=Latency|label=WorkflowCompleted|value=5867|correlationId=37c15fccc6a54b38a5fbbb00095f30d5|timestamp=1529380799|tag=JokerOrderPostDeal|host=p2nmdwin00001W|env=prod|service=workflowexecutor

2018-06-19 03:59:59.8592|INFO|Metrics|corId=37c15fccc6a54b38a5fbbb00095f30d5;df93bc8414a34e308ca26f56ea7e911b;orders-547400b38cff40a0aa25922f7a62cd18;547400b38cff40a0aa25922f7a62cd18|messageId=3b7a948448fe4ce89445c31ddd35deb4|CustomEvent: collection=GambitV2|group=JokerWorkflows|category=Latency|label=WorkflowCompleted|value=5867|correlationId=37c15fccc6a54b38a5fbbb00095f30d5|timestamp=1529380799|tag=JokerOrderPostDeal|host=p2nmdwin00001W|env=prod|service=workflowexecutor

We want to write a search that will tell us when we have a log for label=WorkflowStarted but don't have a corresponding label=WorkflowCompleted where the time between the WorkflowStarted and now is greater than a certain threshold. The two messages should always be able to be joined on the correlationId field. We have tried:

index=qa-gambit CustomEvent label=WorkflowStarted | rex field=_raw "\|tag=(?<tag>.*)\|host" | eval StartTime = _time | join correlationId [search index=qa-gambit CustomEvent label=WorkflowCompleted | eval CompletedTime = _time ] | eval duration = CompletedTime - StartTime | convert dur2sec(duration) | stats count by correlationId, group, tag, duration | where count < 2 AND duration > 60

But this doesn't seem to work. Also not sure why we need to regex the tag field as all the other fields are parsed automatically.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Try something like this...

 index=qa-gambit CustomEvent (label="WorkflowStarted" OR label="WorkflowCompleted" )
| eval StartTime = case(label="WorkflowStarted",_time) 
| eval CompletedTime = case(label="WorkflowCompleted",_time) 
| stats min(StartTime) as StartTime 
    max(CompletedTime) as CompletedTime 
    range(_time) as duration
    count as eventcount 
    by correlationID 
| where isnull(CompletedTime) AND (now() - StartTime) > 3600
0 Karma