How to define a search to find missing data?

Path Finder

I have an application that has predictable log entries when it starts a series of activities and when it finishes. I can create transactions, etc. - all good. What I'm struggling with, however, is how to construct a search that tells me which activities didn't complete. Basically - identifying that a set of activities was started, but didn't result in the log entries that indicate it finished. I've done some searches like

index="cloudwatch" | regex "\w{8}\-\w{4}\-\w{4}\-\w{4}\-\w{12}/\d{6}/\d+/\w{8}\-\w{4}\-\w{4}\-\w{4}\-\w{12}" | rex "(?\w{8}\-\w{4}\-\w{4}\-\w{4}\-\w{12})/(?\d{6})/(?\d+)/(?\w{8}\-\w{4}\-\w{4}\-\w{4}\-\w{12})" | stats count by admin,ticket,token,chunk_id | where count<7

the regex is basically identifying all the events that have the string that can be used to identify participation in the same transaction - then the rex extracts the individual parts that are meaningful. I can run this, but over a set of tens of millions of events, it's not the fastest in the world. Even setting this up as a scheduled search, I'll end up with phantom records because not all events are within the timeframe being searched - you'll get some orphans at the edges. I can also search for just the beginning / end, adding something like

(("received event" manifest.json) OR "writing postdata")

to the base search, that can speed up the search, but only by a little - and it'll get worse as the data set grows. Ultimately, I want to define a search that finds 'chunk_id's that didn't complete, schedule it, and get an alert. The reason being, there's usually some corrective action needed and it can be time-sensitive.

I feel like I've struggled with this notion of "finding the data that isn't there" numerous times in the past, never quite getting something that seemed "right" - so, finally posting something up here in case anyone has some pointers.


Have you looked at accelerated datamodel? From you description, is appears, you have the right query to get you the desired results, what you looking for is a faster solution.


