My goal is to solve the following:
Constraints:
I tried using transactions as mentioned here: https://community.splunk.com/t5/Splunk-Dev/transaction-with-one-start-and-several-end-conditions/m-p... , but I end up with some events being grouped together incorrectly.
I also tried to go the streamstats way (as described in the post above) but I cannot adapt it to my requirements quite right.
Here is an event sample that should be detected by the desired search (events more than 30 sec apart):
2020-2-20T11:11:11 host1 component1 ... to FAIL ...
[random events]
2020-2-20T11:11:55 host1 component 1... from FAIL to ...
I am genuinely grateful for all of your feedback, thank you very much. ❤️
And I very much agree with the observation that the business logic is not solid and that I should have worked that one out earlier.
What @yuanliu suggests is the solution that offers the best compromise.
This is how i tried to do it:
index=my_index sourcetype=my_sourcetype ("to FAIL" OR "from FAIL")
| transaction host component startswith("to FAIL") endswith("from FAIL") maxevents=2 keepevicted=true keeporphans=true
| where duration > 30 or closed_txn = 0As an example, this is the very first result of this search
2026-04-24T11:04:52.2 component1 from FAIL...
2026-04-24T11:11:15.2 component1 ...to FAILBoth starting and ending events are present and for the same component which is good.
But for some reason, the ending supposedly event occurs BEFORE the starting event. That one may have confused me a bit 😞
If we look into the original source file, we can see this:
2026-04-24T11:11:17.3 component1 from FAIL...
2026-04-24T11:11:15.2 component1 ...to FAIL
2026-04-24T11:04:52.2 component1 from FAIL...
2026-04-24T11:04:48.1 component1 ...to FAILSo actually, the source contained two perfectly benign transactions: Events (1,2) and events (3,4). But for some reason, splunk INCORRECTLY considers events (2,3) to be a transaction.
@PickleRickto address your 4 main questions:
(1) Listing of reasons that may break the transaction pattern:
Well, your searches will only be as good as your underlying data is. That's why it's very important to make sure your data is of decent quality. I understand your limitations but the receiver of your reports/alerts/whatever (I assume it's some request "from the business") will have to accept that due to poor quality of source data the results of your searches will be unreliable. You can't magically conjur good results from bad data as simple as that. The "overlap" scenario is pretty common with - for example - login sessions data. It's not unusual for a user to log into a host which later crashes and you don't have a logout event. But the timestamp should be consistent in order to make sense of the sequence of events.
The confusing thing about transaction command is that while it requires the events to be in reverse chronological order, after merging them into transactions the events within a transaction are in the direct chronological order. That's... weird, I admit. It's just how it works.
@PickleRick nailed the core problem in your case: unclear business logic. In addition to impossibly large timestamp uncertainty (> 30 second?), "the order of my events can be unpredictable" and "doesnt matter which one comes first, starting or ending" contradict the basic premise of a transaction. In other words, before you can find a SPL solution, much additional work is needed, e.g.,
After this, you can see if their rules fall within definition of transaction. IF there indeed is a transaction, you can start with PickleRick's suggestion to test transaction command, then see if some more efficient commands can apply the same rules as @livehybrid suggested. If not, you will need to workshop with owners to come up with an algorithm before working on SPL.
While I agree with @livehybrid that the transaction command is best avoided and there are often better ways around your problem, I'd say that it might be a good start to begin solving your problem by finding the transaction command which works properly on a small data set because that could describe what business logic is behind your issue.
For example - if your "transaction" is delimited by "to FAIL" and "from FAIL" but each of them can be first, how can you tell which one starts your transaction?
Is there any other field which can uniquely identify your transaction (an identifier which would be common to both "ends" of your transaction)?
Can you have more than one "overlaping" transaction for any host?
Your constraint of "timestamps may not be consistent so I need to check 30 seconds around an event" confuses me completely. What does it mean? If you can't trust your timestamps at all, what good are they to you? Maybe you meant that due to some internal mechanics of your monitored process the events could be reported with a delay? But the question which comes to mind then is "can't you fix it on the source side? if your reported times are unreliable, what are they for?".
Hi @zapping575
I would generally avoid transaction where possible and try and use stats for this instead, you might find something like this works (it might need tweaking slightly) because I dont have data to test with but tried using the sample events provided:
| makeresults count=2 | streamstats count | eval _raw=IF(count==1, "2020-2-20T11:11:11 host1 component1 ... to FAIL ...", "2020-2-20T11:11:55 host1 component 1... from FAIL to ..."), host="host1", component="component1", _time=strptime(_raw,"%FT%H:%M:%S")
```index=your_index ("to FAIL" OR "from FAIL") ```
| eval event_type=case(
LIKE(_raw,"%to FAIL%"), "start",
LIKE(_raw,"%from FAIL%"), "end"
)
| stats earliest(_time) as first_time latest(_time) as last_time values(_raw) as raw_events by host, component, event_type
| eval first_time_fmt=strftime(first_time, "%Y-%m-%d %H:%M:%S"), last_time_fmt=strftime(last_time, "%Y-%m-%d %H:%M:%S")
| stats values(eval(if(event_type="start", first_time, null()))) as start_time
values(eval(if(event_type="end", first_time, null()))) as end_time
values(raw_events) as all_raw
by host, component
| eval diff=abs(start_time - end_time)
| where diff > 30
| eval start_time=strftime(start_time, "%Y-%m-%d %H:%M:%S"), end_time=strftime(end_time, "%Y-%m-%d %H:%M:%S")
| table host, component, start_time, end_time, diff, all_raw
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Hi @zapping575
May i know if you tried like this:
sourcetype=your_source_type | transaction host startswith="to FAIL" endswith="from FAIL" maxspan=30s
for doc reference:
https://help.splunk.com/en/splunk-enterprise/spl-search-reference/10.2/search-commands/transaction
----------------------------------------------------------------------------------------------
If this post or any post addressed your question, could you pls:
Give it karma to show appreciation
PS - As of May 2026, my Karma Given is 2312 and my Karma Received is 497, lets revamp the Karma Culture!
Thanks and best regards, Sekar
--------------------------------------------------------------------------------------------