I am genuinely grateful for all of your feedback, thank you very much. ❤️ And I very much agree with the observation that the business logic is not solid and that I should have worked that one out earlier. What @yuanliu suggests is the solution that offers the best compromise. If I were to go with a "super basic" approach, I would just alert every time I see a starting event But: there are too many of those to check each of them manually. At least some degree of automation would be immensely helpful even if not ALL corner cases are covered The mininum viable solution would be: In case a starting event has been found and its subsequent ending event occurs more than 30 seconds later in time, I want to be notified. In case starting and/or ending events are found that do not fit the transaction pattern, I want to be notified (1). This is how i tried to do it: index=my_index sourcetype=my_sourcetype ("to FAIL" OR "from FAIL")
| transaction host component startswith("to FAIL") endswith("from FAIL") maxevents=2 keepevicted=true keeporphans=true
| where duration > 30 or closed_txn = 0 As an example, this is the very first result of this search 2026-04-24T11:04:52.2 component1 from FAIL...
2026-04-24T11:11:15.2 component1 ...to FAIL Both starting and ending events are present and for the same component which is good. But for some reason, the ending supposedly event occurs BEFORE the starting event. That one may have confused me a bit 😞 If we look into the original source file, we can see this: 2026-04-24T11:11:17.3 component1 from FAIL...
2026-04-24T11:11:15.2 component1 ...to FAIL
2026-04-24T11:04:52.2 component1 from FAIL...
2026-04-24T11:04:48.1 component1 ...to FAIL So actually, the source contained two perfectly benign transactions: Events (1,2) and events (3,4). But for some reason, splunk INCORRECTLY considers events (2,3) to be a transaction. @PickleRickto address your 4 main questions: For the sake of simplicity, lets assume that "...to FAIL" always denotes a starting event Fields that uniquely identify a transaction are "host" and "component" Situations where transactions may overlap are described in (1). Apologies for the confusion regarding the original timestamps. In my defense, have no control how they are created. Sometimes they are broken (in which case splunk falls back to using indextime), sometimes events are out of sequence. It's a given that I cannot change. It is an acceptable tradeoff to first identify all the good transactions, exclude them from the results and then check the "outliers" (caused by possibly bad timestamps) individually. (1) Listing of reasons that may break the transaction pattern: After a starting event occurs, the system may be rebooted. Therefore, no ending event may occur. The transaction is not closed. Because of incorrect timestamps in the source files, splunk might index a starting event and ending event with the same timestamp A starting event may occur in one source file, while the ending event may occur in another. We are using batch inputs for data ingestion
... View more