Alerting

Alerting for Conditional Event Existance

jnovino
New Member

We current log two different messages for the start and completion of a workflow we would like to create an alert that tells us when we see a start log but don't see a corresponding competition log within a threshold i.e. 1 hour:

2018-06-19 03:59:59.8592|INFO|Metrics|corId=37c15fccc6a54b38a5fbbb00095f30d5;df93bc8414a34e308ca26f56ea7e911b;orders-547400b38cff40a0aa25922f7a62cd18;547400b38cff40a0aa25922f7a62cd18|messageId=3b7a948448fe4ce89445c31ddd35deb4|CustomEvent: collection=GambitV2|group=JokerWorkflows|category=Latency|label=WorkflowCompleted|value=5867|correlationId=37c15fccc6a54b38a5fbbb00095f30d5|timestamp=1529380799|tag=JokerOrderPostDeal|host=p2nmdwin00001W|env=prod|service=workflowexecutor

2018-06-19 03:59:59.8592|INFO|Metrics|corId=37c15fccc6a54b38a5fbbb00095f30d5;df93bc8414a34e308ca26f56ea7e911b;orders-547400b38cff40a0aa25922f7a62cd18;547400b38cff40a0aa25922f7a62cd18|messageId=3b7a948448fe4ce89445c31ddd35deb4|CustomEvent: collection=GambitV2|group=JokerWorkflows|category=Latency|label=WorkflowCompleted|value=5867|correlationId=37c15fccc6a54b38a5fbbb00095f30d5|timestamp=1529380799|tag=JokerOrderPostDeal|host=p2nmdwin00001W|env=prod|service=workflowexecutor

We want to write a search that will tell us when we have a log for label=WorkflowStarted but don't have a corresponding label=WorkflowCompleted where the time between the WorkflowStarted and now is greater than a certain threshold. The two messages should always be able to be joined on the correlationId field. We have tried:

index=qa-gambit CustomEvent label=WorkflowStarted | rex field=_raw "\|tag=(?<tag>.*)\|host" | eval StartTime = _time | join correlationId [search index=qa-gambit CustomEvent label=WorkflowCompleted | eval CompletedTime = _time ] | eval duration = CompletedTime - StartTime | convert dur2sec(duration) | stats count by correlationId, group, tag, duration | where count < 2 AND duration > 60

But this doesn't seem to work. Also not sure why we need to regex the tag field as all the other fields are parsed automatically.

0 Karma

DalJeanis
Legend

Try something like this...

 index=qa-gambit CustomEvent (label="WorkflowStarted" OR label="WorkflowCompleted" )
| eval StartTime = case(label="WorkflowStarted",_time) 
| eval CompletedTime = case(label="WorkflowCompleted",_time) 
| stats min(StartTime) as StartTime 
    max(CompletedTime) as CompletedTime 
    range(_time) as duration
    count as eventcount 
    by correlationID 
| where isnull(CompletedTime) AND (now() - StartTime) > 3600
0 Karma
Get Updates on the Splunk Community!

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...