Splunk Search

How to Use Transaction to Retrieve Groups of Events

Traer001
Path Finder

Hi all, 

I'm trying to use a transaction to get multiple pairs of events (the selection and release of a node). So I have multiple events like the ones below:

2021-04-05 14:34:23  Node 123 selected for User 1

2021-04-05 14:34:17  User 1 released Node 118

2021-04-05 14:34:11  Node 123 selected for User 1

2021-04-05 14:34:05  Node 118 selected for User 1

2021-04-05 14:33:46  Node 118 selected for User 1

2021-04-05 14:33:29  User 1 released Node 103

2021-04-05 14:33:23  Node 118 selected for User 1

2021-04-05 14:33:21  Node 103 selected for User 1

2021-04-05 14:33:08  User 1 released Node 118

2021-04-05 14:33:02  Node 103 selected for User 1

2021-04-05 14:32:53  Node 118 selected for User 1

 

I am trying to get groups of events that would have pairs with the start and end of each node selection/release. In the above example I would want to retrieve 2 complete pairs:

1) The events at 14:33:23 and 14:34:17

2) The events at 14:33:02 and 14:33:29

 

My current query looks like this:

index=INDEX host=HOSTNAME sourcetype=SOURCETYPE
| rex field=_raw "Node\s(?<node>\d+)\sselected\sfor\sUser\s(?<user_id>\d+)"
| rex field=_raw "User\s(?<user_id>\d+)\sreleased\sNode\s(?<node>\d+)"
| where isnotnull(node)
| transaction node user_id startswith=selected endswith=released | table user_id, node, duration

 

I am currently getting pairs of events with the correct user ids and nodes, but it is taking the earliest instance where the user id and node values match with the latest user id and node values. I don't necessarily want the earliest event however. So instead of the events at 14:33:23 and 14:34:17, my current query would be giving me the events at 14:32:53 and 14:34:17.

How can I adjust this query to get the results I'm after?

 

Labels (2)
0 Karma
1 Solution

ITWhisperer
Ultra Champion
| makeresults | eval _raw="2021-04-05 14:34:23  Node 123 selected for User 1
2021-04-05 14:34:17  User 1 released Node 118
2021-04-05 14:34:11  Node 123 selected for User 1
2021-04-05 14:34:05  Node 118 selected for User 1
2021-04-05 14:33:46  Node 118 selected for User 1
2021-04-05 14:33:29  User 1 released Node 103
2021-04-05 14:33:23  Node 118 selected for User 1
2021-04-05 14:33:21  Node 103 selected for User 1
2021-04-05 14:33:08  User 1 released Node 118
2021-04-05 14:33:02  Node 103 selected for User 1
2021-04-05 14:32:53  Node 118 selected for User 1"
| multikv noheader=t
| eval _time=strptime(_raw,"%Y-%m-%d %H:%M:%S")
| rex field=_raw "Node\s(?<node>\d+)\sselected\sfor\sUser\s(?<user_id>\d+)"
| rex field=_raw "User\s(?<user_id>\d+)\sreleased\sNode\s(?<node>\d+)"
| fields _time node user_id
| sort _time
| where isnotnull(node)


| eval selectedat=if(match(_raw,"selected"),_time,null)
| eval releasedat=if(match(_raw,"released"),_time,null)
| sort user_id node -_time
| streamstats min(releasedat) as releasedat by node user_id
| stats min(selectedat) as selectedat by node user_id releasedat
| fieldformat releasedat=strftime(releasedat,"%Y-%m-%d %H:%M:%S")
| fieldformat selectedat=strftime(selectedat,"%Y-%m-%d %H:%M:%S")

View solution in original post

bowesmana
Champion

You can use transaction to preserve the initial select with no release examples - these are 'orphans' and you can use this to find the first instances.

| makeresults 
| eval _raw="2021-04-05 14:34:23  Node 123 selected for User 1
2021-04-05 14:34:17  User 1 released Node 118
2021-04-05 14:34:11  Node 123 selected for User 1
2021-04-05 14:34:05  Node 118 selected for User 1
2021-04-05 14:33:46  Node 118 selected for User 1
2021-04-05 14:33:29  User 1 released Node 103
2021-04-05 14:33:23  Node 118 selected for User 1
2021-04-05 14:33:21  Node 103 selected for User 1
2021-04-05 14:33:08  User 1 released Node 118
2021-04-05 14:33:02  Node 103 selected for User 1
2021-04-05 14:32:53  Node 118 selected for User 1" 
| multikv noheader=t 
| fields - Column* linecount
| eval _time=strptime(_raw, "%F %T") 
| rex field=_raw "Node\s(?<node>\d+)\sselected\sfor\sUser\s(?<user_id>\d+)" 
| rex field=_raw "User\s(?<user_id>\d+)\sreleased\sNode\s(?<node>\d+)" 
| where isnotnull(node) 
| transaction node user_id startswith=selected endswith=released keeporphans=t
| sort node user_id _time
| streamstats min(_time) as mint max(_time) as maxt by node user_id reset_after="closed_txn=1"
| eval duration=duration+(_time-mint), _time=mint
| where closed_txn=1

So, this adds the keeporphans flag to transaction and then sorts so streamstats can do its work.

Streamstats is finding the first of the events and setting 'mint' var as the first time - note that the streamstats is reset after a closed transaction.

Then time and duration are adjusted to reflect the duration from the first event for the selected node rather than the last.

I know you have had a number of examples of this type of data recently - if it is possible to improve the data to avoid having to do all the logic in Splunk.

Your data may break the above and using the transaction command should never be used as a 'first choice' command unless you have very specific data sets as it comes with memory constraints. Most things can be achieved with forms of stats, this problem too would not really need transaction as you could most likely do it all with streamstats, similar to the other solutions I've posted earlier.

 

0 Karma

Traer001
Path Finder

@bowesmana Okay, thank you for the response! I was trying to see if I'd be able to also include the earliest instance of a node being selected after it had been released rather than the last instance before a node is released. So essentially I was hoping to see if I could retrieve a transaction that started with the event at 14:33:23 and ended at 14:34:17, because the event at 14:33:23 is the earliest instance of Node 118 being selected after it had been released. From your and @ITWhisperer responses, it doesn't appear as though I can do this with transactions however. It appears to only retrieve the most recent event that can fit with the transaction.

0 Karma

ITWhisperer
Ultra Champion
| makeresults | eval _raw="2021-04-05 14:34:23  Node 123 selected for User 1
2021-04-05 14:34:17  User 1 released Node 118
2021-04-05 14:34:11  Node 123 selected for User 1
2021-04-05 14:34:05  Node 118 selected for User 1
2021-04-05 14:33:46  Node 118 selected for User 1
2021-04-05 14:33:29  User 1 released Node 103
2021-04-05 14:33:23  Node 118 selected for User 1
2021-04-05 14:33:21  Node 103 selected for User 1
2021-04-05 14:33:08  User 1 released Node 118
2021-04-05 14:33:02  Node 103 selected for User 1
2021-04-05 14:32:53  Node 118 selected for User 1"
| multikv noheader=t
| eval _time=strptime(_raw,"%Y-%m-%d %H:%M:%S")
| rex field=_raw "Node\s(?<node>\d+)\sselected\sfor\sUser\s(?<user_id>\d+)"
| rex field=_raw "User\s(?<user_id>\d+)\sreleased\sNode\s(?<node>\d+)"
| fields _time node user_id
| sort _time
| where isnotnull(node)


| eval selectedat=if(match(_raw,"selected"),_time,null)
| eval releasedat=if(match(_raw,"released"),_time,null)
| sort user_id node -_time
| streamstats min(releasedat) as releasedat by node user_id
| stats min(selectedat) as selectedat by node user_id releasedat
| fieldformat releasedat=strftime(releasedat,"%Y-%m-%d %H:%M:%S")
| fieldformat selectedat=strftime(selectedat,"%Y-%m-%d %H:%M:%S")

View solution in original post

ITWhisperer
Ultra Champion

I am not getting what you say you are getting with this data

 

| makeresults | eval _raw="2021-04-05 14:34:23  Node 123 selected for User 1
2021-04-05 14:34:17  User 1 released Node 118
2021-04-05 14:34:11  Node 123 selected for User 1
2021-04-05 14:34:05  Node 118 selected for User 1
2021-04-05 14:33:46  Node 118 selected for User 1
2021-04-05 14:33:29  User 1 released Node 103
2021-04-05 14:33:23  Node 118 selected for User 1
2021-04-05 14:33:21  Node 103 selected for User 1
2021-04-05 14:33:08  User 1 released Node 118
2021-04-05 14:33:02  Node 103 selected for User 1
2021-04-05 14:32:53  Node 118 selected for User 1"
| multikv noheader=t
| eval _time=strptime(_raw,"%Y-%m-%d %H:%M:%S")
| rex field=_raw "Node\s(?<node>\d+)\sselected\sfor\sUser\s(?<user_id>\d+)"
| rex field=_raw "User\s(?<user_id>\d+)\sreleased\sNode\s(?<node>\d+)"
| fields _time node user_id
| sort - _time
| where isnotnull(node)
| transaction node user_id startswith=selected endswith=released

 

2021-04-05 14:34:051181

2021-04-05 14:34:05 Node 118 selected for User 1

2021-04-05 14:34:17 User 1 released Node 118

2021-04-05 14:33:211031

2021-04-05 14:33:21 Node 103 selected for User 1

2021-04-05 14:33:29 User 1 released Node 103

2021-04-05 14:32:531181

2021-04-05 14:32:53 Node 118 selected for User 1

2021-04-05 14:33:08 User 1 released Node 118

 

Which seem reasonable to me (each transaction is made up of selected/released pair which are closest to each other).

This seems different to what you are expecting i.e. 3 transactions instead of 2.

Which version of splunk are you running?

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!