Splunk Search

Transaction with starting and ending event

zapping575
Communicator

My goal is to solve the following:

  • I have what I consider "starting" events. They contain the string "to FAIL".
  • I have also what I consider "ending" events. They contain the string "from FAIL".
  • I need to find cases in which a pair of these events is more than 30 seconds aparat (doesnt matter which one comes first, starting or ending)
  • Similar to how transaction shows the results. I would like to have both the starting and the ending event in my result set

Constraints:

  • The search needs to be for each host, as well as the same component on said host (both component and host are available as fields)
  • Sometimes, the order of my events can be unpredictable (timestamps may not be consistent). Because of this, the search should check both 30 sec in the future AND the past for an ending event.

I tried using transactions as mentioned here: https://community.splunk.com/t5/Splunk-Dev/transaction-with-one-start-and-several-end-conditions/m-p... , but I end up with some events being grouped together incorrectly.

I also tried to go the streamstats way (as described in the post above) but I cannot adapt it to my requirements quite right.

Here is an event sample that should be detected by the desired search (events more than 30 sec apart):

2020-2-20T11:11:11 host1 component1 ... to FAIL ...
[random events]
2020-2-20T11:11:55 host1 component 1... from FAIL to ...


0 Karma

zapping575
Communicator

I am genuinely grateful for all of your feedback, thank you very much. ❤️

And I very much agree with the observation that the business logic is not solid and that I should have worked that one out earlier.

What @yuanliu suggests is the solution that offers the best compromise.

  • If I were to go with a "super basic" approach, I would just alert every time I see a starting event
    • But: there are too many of those to check each of them manually.
  • At least some degree of automation would be immensely helpful
    • even if not ALL corner cases are covered
  • The mininum viable solution would be:
    • In case a starting event has been found and its subsequent ending event occurs more than 30 seconds later in time, I want to be notified.
    • In case starting and/or ending events are found that do not fit the transaction pattern, I want to be notified (1).

This is how i tried to do it:

index=my_index sourcetype=my_sourcetype ("to FAIL" OR "from FAIL")
| transaction host component startswith("to FAIL") endswith("from FAIL") maxevents=2 keepevicted=true keeporphans=true
| where duration > 30 or closed_txn = 0

As an example, this is the very first result of this search

2026-04-24T11:04:52.2 component1 from FAIL...
2026-04-24T11:11:15.2 component1 ...to FAIL

Both starting and ending events are present and for the same component which is good.
But for some reason, the ending supposedly event occurs BEFORE the starting event. That one may have confused me a bit 😞
If we look into the original source file, we can see this:

2026-04-24T11:11:17.3 component1 from FAIL...
2026-04-24T11:11:15.2 component1 ...to FAIL
2026-04-24T11:04:52.2 component1 from FAIL...
2026-04-24T11:04:48.1 component1 ...to FAIL

So actually, the source contained two perfectly benign transactions: Events (1,2) and events (3,4). But for some reason, splunk INCORRECTLY considers events (2,3) to be a transaction.

@PickleRickto address your 4 main questions:

  • For the sake of simplicity, lets assume that "...to FAIL" always denotes a starting event
  • Fields that uniquely identify a transaction are "host" and "component"
  • Situations where transactions may overlap are described in (1).
  • Apologies for the confusion regarding the original timestamps. In my defense, have no control how they are created. Sometimes they are broken (in which case splunk falls back to using indextime), sometimes events are out of sequence. It's a given that I cannot change. It is an acceptable tradeoff to first identify all the good transactions, exclude them from the results and then check the "outliers" (caused by possibly bad timestamps) individually.

 

(1) Listing of reasons that may break the transaction pattern:

  • After a starting event occurs, the system may be rebooted. Therefore, no ending event may occur. The transaction is not closed.
  • Because of incorrect timestamps in the source files, splunk might index a starting event and ending event with the same timestamp
  • A starting event may occur in one source file, while the ending event may occur in another. We are using batch inputs for data ingestion
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, your searches will only be as good as your underlying data is. That's why it's very important to make sure your data is of decent quality. I understand your limitations but the receiver of your reports/alerts/whatever (I assume it's some request "from the business") will have to accept that due to poor quality of source data the results of your searches will be unreliable. You can't magically conjur good results from bad data as simple as that. The "overlap" scenario is pretty common with - for example - login sessions data. It's not unusual for a user to log into a host which later crashes and you don't have a logout event. But the timestamp should be consistent in order to make sense of the sequence of events.

The confusing thing about transaction command is that while it requires the events to be in reverse chronological order, after merging them into transactions the events within a transaction are in the direct chronological order. That's... weird, I admit. It's just how it works.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@PickleRick nailed the core problem in your case: unclear business logic.  In addition to impossibly large timestamp uncertainty (> 30 second?), "the order of my events can be unpredictable" and "doesnt matter which one comes first, starting or ending" contradict the basic premise of a transaction.  In other words, before you can find a SPL solution, much additional work is needed, e.g.,

  • Get timestamp among different data sources to be accurate within 1ms.  This is not a very strict requirement today.
  • Manually review events of interest ("to FAIL", "from FAIL", something of interest in between), and ask tech/business owners how they decide which "pairs" should be flagged.  Arrive at a rule/set of rules that they (owners) can manually perform without exception.

After this, you can see if their rules fall within definition of transaction.  IF there indeed is a transaction, you can start with PickleRick's suggestion to test transaction command, then see if some more efficient commands can apply the same rules as @livehybrid suggested.  If not, you will need to workshop with owners to come up with an algorithm before working on SPL.

PickleRick
SplunkTrust
SplunkTrust

While I agree with @livehybrid that the transaction command is best avoided and there are often better ways around your problem, I'd say that it might be a good start to begin solving your problem by finding the transaction command which works properly on a small data set because that could describe what business logic is behind your issue.

For example - if your "transaction" is delimited by "to FAIL" and "from FAIL" but each of them can be first, how can you tell which one starts your transaction?

Is there any other field which can uniquely identify your transaction (an identifier which would be common to both "ends" of your transaction)?

Can you have more than one "overlaping" transaction for any host?

Your constraint of "timestamps may not be consistent so I need to check 30 seconds around an event" confuses me completely. What does it mean? If you can't trust your timestamps at all, what good are they to you? Maybe you meant that due to some internal mechanics of your monitored process the events could be reported with a delay? But the question which comes to mind then is "can't you fix it on the source side? if your reported times are unreliable, what are they for?".

livehybrid
SplunkTrust
SplunkTrust

Hi @zapping575 

I would generally avoid transaction where possible and try and use stats for this instead, you might find something like this works (it might need tweaking slightly) because I dont have data to test with but tried using the sample events provided:

| makeresults count=2 | streamstats count | eval _raw=IF(count==1, "2020-2-20T11:11:11 host1 component1 ... to FAIL ...", "2020-2-20T11:11:55 host1 component 1... from FAIL to ..."), host="host1", component="component1", _time=strptime(_raw,"%FT%H:%M:%S")
```index=your_index ("to FAIL" OR "from FAIL") ```
| eval event_type=case(
    LIKE(_raw,"%to FAIL%"), "start",
    LIKE(_raw,"%from FAIL%"), "end"
    )
| stats earliest(_time) as first_time latest(_time) as last_time values(_raw) as raw_events by host, component, event_type
| eval first_time_fmt=strftime(first_time, "%Y-%m-%d %H:%M:%S"), last_time_fmt=strftime(last_time, "%Y-%m-%d %H:%M:%S") 
| stats values(eval(if(event_type="start", first_time, null()))) as start_time
    values(eval(if(event_type="end", first_time, null()))) as end_time
    values(raw_events) as all_raw
    by host, component 
| eval diff=abs(start_time - end_time)
| where diff > 30 
| eval start_time=strftime(start_time, "%Y-%m-%d %H:%M:%S"), end_time=strftime(end_time, "%Y-%m-%d %H:%M:%S") 
| table host, component, start_time, end_time, diff, all_raw

Screenshot 2026-05-04 at 12.07.50.png

 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

inventsekar
SplunkTrust
SplunkTrust

Hi @zapping575 

May i know if you tried like this:

sourcetype=your_source_type | transaction host startswith="to FAIL" endswith="from FAIL" maxspan=30s 

for doc reference:

https://help.splunk.com/en/splunk-enterprise/spl-search-reference/10.2/search-commands/transaction

 

----------------------------------------------------------------------------------------------
If this post or any post addressed your question, could you pls:
Give it karma to show appreciation

PS - As of May 2026, my Karma Given is 2312 and my Karma Received is 497, lets revamp the Karma Culture!
Thanks and best regards, Sekar
--------------------------------------------------------------------------------------------

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...