I need to document a transaction that begins with a multithreaded process. The process creates multiple entries in an event log:
Message: UseCaseX.ProcessData ItemID=5 Provider=123 ElapsedMilliseconds=230 timestamp=09/21/2017 10:16:33 AM
Message: UseCaseX.ProcessData ItemID=5 Provider=333 ElapsedMilliseconds=130 timestamp=09/21/2017 10:16:38 AM
Message: UseCaseX.ProcessData ItemID=5 Provider=999 ElapsedMilliseconds=780 timestamp=09/21/2017 10:16:41 AM
The 'Provider' value will vary on every occasion; there's no telling which Provider may come first.
The transaction ends with a single identifiable event log entry:
Message: UseCaseY.CalculateScore ItemID=5 ElapsedMilliseconds=780 timestamp=09/21/2017 10:16:58 AM
There are many other log entries and servers involved in the time between the beginning of the transaction and UseCaseY.CalculateScore. I'm trying to produce a transaction that will have its duration span from the first instance of UseCaseX.ProcessData to the instance of UseCaseY.CalculateScore.
searchHere | transaction ItemID startswith="UseCaseX.ProcessData" endswith="CalculateScore"
I'm getting a duration that starts with "10:16:41 AM", when I want my duration to start with "10:16:33 AM".
|makeresults |eval raw="Message: UseCaseX.ProcessData ItemID=5 Provider=123 ElapsedMilliseconds=230 timestamp=09/21/2017T10:16:33;Message: UseCaseX.ProcessData ItemID=5 Provider=333 ElapsedMilliseconds=130 timestamp=09/21/2017T10:16:38;Message: UseCaseX.ProcessData ItemID=5 Provider=999 ElapsedMilliseconds=780 timestamp=09/21/2017T10:16:41;Message: UseCaseY.CalculateScore ItemID=5 ElapsedMilliseconds=780 timestamp=09/21/2017T10:16:58" | makemv delim=";" raw |mvexpand raw | rename raw AS _raw `comment("Ignore above code which is just used for generating a dataset")` | kv | eval _time = strptime(timestamp,"%m/%d/%YT%H:%M:%S") |rex field=_raw "Message\:\s+(?<Message>.*?)\s*\s+ItemID" | dedup ItemID Message |reverse | transaction ItemID startswith="UseCaseX.ProcessData" endswith="CalculateScore"
Message field is extracted, below search should work.
<YourBaseSearch> | dedup ItemID Message sortby _time | transaction ItemID startswith="UseCaseX.ProcessData" endswith="CalculateScore"
@jasongb, Have you tried with stats instead of transaction?
<YourBaseSearch> ("UseCaseX.ProcessData" OR "UseCaseY.CalculateScore") | eval transactionType=case(searchmatch("UseCaseX.ProcessData"),"ProcessData",searchmatch("UseCaseY.CalculateScore"),"CalculateScore") | stats earliest(_time) as earliestTime latest(_time) as latestTime values(transactionType) as transactionType dc(transactionType) as distinctTypes by ItemID | search distinctTypes=2 transactionType="ProcessData" transactionType="CalculateScore" | eval duration=latestTime-earliestTime | eval _time=earliestTime | fields - latestTime
Thanks very much for your suggestion! I have been focused on the transaction command because of the other data that's between the start and end of the transaction. The data spans multiple servers, multiple logs (iis, event log, etc.) and many different types of events.
Each event can be reliably linked by the ItemID field.
I'm under the impression that the stats command might let me produce some elements of what I need (i.e., an accurate duration), but it won't let me harvest all of the events that occur from beginning to end. The transaction command is meeting all of my needs - except for letting me see the first instance of that 'startswith'.
I will look further into the stats command; perhaps it's a case of not asking the right question of the data.
Do read about event grouping and correlation: http://docs.splunk.com/Documentation/Splunk/latest/Search/Abouteventcorrelation. What you seem to be achieveing through transaction, similar results can be fetched via stats which would be way faster. I had missed correlating events by ItemID as in your original transaction which I have updated by adding
by ItemID in the stats command.
You can use
list(<YourFieldName>) to fetch the required details once correlated via
Could you please elaborate as to what do you mean by
won't let me harvest all of the events that occur from beginning to end.? Also what is the remaining part of your search once you have correlated events through the transaction command?
@niketnilay - Would you recommend this approach over transaction if we want to group (say) 10 or more events into a group using a common identifier which is re-used by other group of events?
@nareshinsvu transaction is usually good when you know the session ID and starting and end conditions. However, the answer would depend on the data you have the output you want and the volume