Splunk Search

Can you have a transaction startswith when there are two starting conditions?

Super Champion

I'm trying to build transaction that has an optional leading starting event. The events I'm using don't have any helpful tracking fields that I need, so I have to rely on the startswith and endswith expressions to establish transaction boundaries. I think I can best explain this by example.

The events I'm trying to group into a transaction are for a backup job. I have the basic case working, however I'm running into trouble when the backup job is incremental. In which case the event that indicates that the job is incremental appears before the standard starting event.

Example 1: Normal full backup job

04/20 20:18:28(11140) - Backing up /mnt/snap4bak/splunk_var_run
04/20 20:19:17(11140) - 4,926 files 120020.89 KB written to DATA-DAILY4 @ 225039.17 KB/min

Example 2: Incremental backup job (event numbers added)

[1] 04/20 20:19:17(11140) - Include files modified 'on or after' 04/18/2010 16:44:04
[2] 04/20 20:19:17(11140) - Backing up /srv
[3] 04/20 20:19:41(11140) - 0 files 0.00 KB written to DATA-DAILY4 @ - KB/min

(The [n] prefix was added for reference and it not part of the actual log message)

Before I realized that I sometimes had incremental backup jobs, I used this search:

eventtype=my-backup-job* | transaction fields="host,pid" startswith=("Backing up") endswith="files written KB/min"

I thought that I should be able to add in the leading event by simply adding an OR to my startswith expression, like so:

eventtype=my-backup-job* | transaction fields="host,pid" startswith=("Backing up" OR "Include files modified") endswith="files written KB/min"

However this change makes no difference. My transaction still contains just event 2 and 3, just as if did with my first search.

I've tried playing with the different transaction options, but haven't found anything that works as of yet. The best theory I have about why this isn't working has to do with transaction automatically discarding non-closed transactions, but if that were the case then adding keepevicted=t should output two transactions. The first one containing just event #1 and would be marked with closed_txn=0, and the second would include events 2 and 3 just as before, and it should be marked with closed_txn=1. However, this doesn't happen. Adding keepevicted=t makes no difference.

Anyone have any ideas?

Update: This still occurs in 4.1.x as of 4.1.2.

Tags (2)
2 Solutions

Super Champion

I have a somewhat working solution. I can now show that the issues is somehow related to the order of the events.

This search moves the incremental event (that is, events with the text "Include files modified") forwards in time by 1 second (these event almost always occur on the same second, so 1 second seems to be enough). Then I have to sort on _time to get the events back in descending time order as transaction requires, and then I get my expected transaction grouping. The one event now appears out of order, but that's not the end of the world. (I suppose I could fix that with rex mode=sed ... if I really wanted to.)

Here is the search:

sourcetype="arcserv_uag" (eventtype=arcserv-job-start OR eventtype=arcserv-job-stats OR eventtype=arcserv-skip-fs OR eventtype=arcserv-job-incremental) | convert num(files) | eval _time=if(searchmatch("Include files modified"),_time+1,_time) | sort -_time | transaction fields="host,pid" startswith=("Backing up") endswith="files written KB/min" keepevicted=t

With this additional info, does this shed any light on the real problem? Does this still sound like a bug, or is this expected behavior?

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Reposting as an answer:

Yes, this is an idiosyncrasy in the implementation of the transaction command in the search language. Although you're thinking of the the transaction as being aggregated as time moves forward, the command experiences time in the other direction, we start from the more recent events and move backwards. Once we hit the "startswith" condition, we emit the transaction. The OR will naturally work, but not when we have both conditions, as the first to reach the command wins. Do you really need the "startswith" condition? I think that removing it would solve the problem.

View solution in original post

Splunk Employee
Splunk Employee

Reposting as an answer:

Yes, this is an idiosyncrasy in the implementation of the transaction command in the search language. Although you're thinking of the the transaction as being aggregated as time moves forward, the command experiences time in the other direction, we start from the more recent events and move backwards. Once we hit the "startswith" condition, we emit the transaction. The OR will naturally work, but not when we have both conditions, as the first to reach the command wins. Do you really need the "startswith" condition? I think that removing it would solve the problem.

View solution in original post

Splunk Employee
Splunk Employee

Excellent. I realized that the ORed startswith wouldn't work in either case for transactions that contain both, since the contract is that we start a new transaction when startswith is satisfied.

Super Champion

Very nice! Removing the "startswith" does resolve this issue. Thanks Stephen!

0 Karma

Super Champion

I have a somewhat working solution. I can now show that the issues is somehow related to the order of the events.

This search moves the incremental event (that is, events with the text "Include files modified") forwards in time by 1 second (these event almost always occur on the same second, so 1 second seems to be enough). Then I have to sort on _time to get the events back in descending time order as transaction requires, and then I get my expected transaction grouping. The one event now appears out of order, but that's not the end of the world. (I suppose I could fix that with rex mode=sed ... if I really wanted to.)

Here is the search:

sourcetype="arcserv_uag" (eventtype=arcserv-job-start OR eventtype=arcserv-job-stats OR eventtype=arcserv-skip-fs OR eventtype=arcserv-job-incremental) | convert num(files) | eval _time=if(searchmatch("Include files modified"),_time+1,_time) | sort -_time | transaction fields="host,pid" startswith=("Backing up") endswith="files written KB/min" keepevicted=t

With this additional info, does this shed any light on the real problem? Does this still sound like a bug, or is this expected behavior?

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Yes, this is an idiosyncrasy in the implementation of the transaction command in the search language. Although you're thinking of the the transaction as being aggregated as time moves forward, the command experiences time in the other direction, we start from the more recent events and move backwards. Once we hit the "startswith" condition, we emit the transaction. The OR will naturally work, but not when we have both conditions, as the first to reach the command wins.

Do you really need the "startswith" condition? I think that removing it would solve the problem.

0 Karma

Splunk Employee
Splunk Employee

As Dan suggested, you might want to open a support case.

For the time being, using an eventtype in combination with a tag might be the easiest way to return the search results you expect.

0 Karma

Super Champion

Can you elaborate on what you mean by "an eventtype in combination with a tag"? I tried creating a single "start" eventtype that matched either "Backing up" or "Include files modified" events. Are you suggesting that tagging that eventtype and using startswith=(tag::eventtype=mytag) could make a difference?

0 Karma

Splunk Employee
Splunk Employee

I'm surprised the OR didn't work, and would encourage you to open a case with Splunk support. A possible work-around may be to use an eventtype, and have the OR statement in the eventtype definition.

Interesting use case. Is it really the same pid for each backup job? What backup solution are you using, if you dont mind my asking?

0 Karma

Super Champion

I just tried the eventtype approach and it give the same results. The pid is reused for an entire backup session which includes multiple backup jobs for different mount points; and I'm trying to build a transaction per mount point. So unfortunately, pid will not give me what I'm looking for. Also the timing of each job varies to the point that I can't effectively use maxspan or maxpause either. The logs are from CA BrightStor ARCserve Backup Agent for UNIX/Linux r12.5 (Build 2439) A6.1018011509.

0 Karma