Alerting

Alerting for Conditional Event Existance

jnovino
New Member

We current log two different messages for the start and completion of a workflow we would like to create an alert that tells us when we see a start log but don't see a corresponding competition log within a threshold i.e. 1 hour:

2018-06-19 03:59:59.8592|INFO|Metrics|corId=37c15fccc6a54b38a5fbbb00095f30d5;df93bc8414a34e308ca26f56ea7e911b;orders-547400b38cff40a0aa25922f7a62cd18;547400b38cff40a0aa25922f7a62cd18|messageId=3b7a948448fe4ce89445c31ddd35deb4|CustomEvent: collection=GambitV2|group=JokerWorkflows|category=Latency|label=WorkflowCompleted|value=5867|correlationId=37c15fccc6a54b38a5fbbb00095f30d5|timestamp=1529380799|tag=JokerOrderPostDeal|host=p2nmdwin00001W|env=prod|service=workflowexecutor

2018-06-19 03:59:59.8592|INFO|Metrics|corId=37c15fccc6a54b38a5fbbb00095f30d5;df93bc8414a34e308ca26f56ea7e911b;orders-547400b38cff40a0aa25922f7a62cd18;547400b38cff40a0aa25922f7a62cd18|messageId=3b7a948448fe4ce89445c31ddd35deb4|CustomEvent: collection=GambitV2|group=JokerWorkflows|category=Latency|label=WorkflowCompleted|value=5867|correlationId=37c15fccc6a54b38a5fbbb00095f30d5|timestamp=1529380799|tag=JokerOrderPostDeal|host=p2nmdwin00001W|env=prod|service=workflowexecutor

We want to write a search that will tell us when we have a log for label=WorkflowStarted but don't have a corresponding label=WorkflowCompleted where the time between the WorkflowStarted and now is greater than a certain threshold. The two messages should always be able to be joined on the correlationId field. We have tried:

index=qa-gambit CustomEvent label=WorkflowStarted | rex field=_raw "\|tag=(?<tag>.*)\|host" | eval StartTime = _time | join correlationId [search index=qa-gambit CustomEvent label=WorkflowCompleted | eval CompletedTime = _time ] | eval duration = CompletedTime - StartTime | convert dur2sec(duration) | stats count by correlationId, group, tag, duration | where count < 2 AND duration > 60

But this doesn't seem to work. Also not sure why we need to regex the tag field as all the other fields are parsed automatically.

0 Karma

DalJeanis
Legend

Try something like this...

 index=qa-gambit CustomEvent (label="WorkflowStarted" OR label="WorkflowCompleted" )
| eval StartTime = case(label="WorkflowStarted",_time) 
| eval CompletedTime = case(label="WorkflowCompleted",_time) 
| stats min(StartTime) as StartTime 
    max(CompletedTime) as CompletedTime 
    range(_time) as duration
    count as eventcount 
    by correlationID 
| where isnull(CompletedTime) AND (now() - StartTime) > 3600
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...