Alerting

Alerting for Conditional Event Existance

jnovino
New Member

We current log two different messages for the start and completion of a workflow we would like to create an alert that tells us when we see a start log but don't see a corresponding competition log within a threshold i.e. 1 hour:

2018-06-19 03:59:59.8592|INFO|Metrics|corId=37c15fccc6a54b38a5fbbb00095f30d5;df93bc8414a34e308ca26f56ea7e911b;orders-547400b38cff40a0aa25922f7a62cd18;547400b38cff40a0aa25922f7a62cd18|messageId=3b7a948448fe4ce89445c31ddd35deb4|CustomEvent: collection=GambitV2|group=JokerWorkflows|category=Latency|label=WorkflowCompleted|value=5867|correlationId=37c15fccc6a54b38a5fbbb00095f30d5|timestamp=1529380799|tag=JokerOrderPostDeal|host=p2nmdwin00001W|env=prod|service=workflowexecutor

2018-06-19 03:59:59.8592|INFO|Metrics|corId=37c15fccc6a54b38a5fbbb00095f30d5;df93bc8414a34e308ca26f56ea7e911b;orders-547400b38cff40a0aa25922f7a62cd18;547400b38cff40a0aa25922f7a62cd18|messageId=3b7a948448fe4ce89445c31ddd35deb4|CustomEvent: collection=GambitV2|group=JokerWorkflows|category=Latency|label=WorkflowCompleted|value=5867|correlationId=37c15fccc6a54b38a5fbbb00095f30d5|timestamp=1529380799|tag=JokerOrderPostDeal|host=p2nmdwin00001W|env=prod|service=workflowexecutor

We want to write a search that will tell us when we have a log for label=WorkflowStarted but don't have a corresponding label=WorkflowCompleted where the time between the WorkflowStarted and now is greater than a certain threshold. The two messages should always be able to be joined on the correlationId field. We have tried:

index=qa-gambit CustomEvent label=WorkflowStarted | rex field=_raw "\|tag=(?<tag>.*)\|host" | eval StartTime = _time | join correlationId [search index=qa-gambit CustomEvent label=WorkflowCompleted | eval CompletedTime = _time ] | eval duration = CompletedTime - StartTime | convert dur2sec(duration) | stats count by correlationId, group, tag, duration | where count < 2 AND duration > 60

But this doesn't seem to work. Also not sure why we need to regex the tag field as all the other fields are parsed automatically.

0 Karma

DalJeanis
Legend

Try something like this...

 index=qa-gambit CustomEvent (label="WorkflowStarted" OR label="WorkflowCompleted" )
| eval StartTime = case(label="WorkflowStarted",_time) 
| eval CompletedTime = case(label="WorkflowCompleted",_time) 
| stats min(StartTime) as StartTime 
    max(CompletedTime) as CompletedTime 
    range(_time) as duration
    count as eventcount 
    by correlationID 
| where isnull(CompletedTime) AND (now() - StartTime) > 3600
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...