Splunk Search

Is it possible to create transactions for the processing of 2 files between two applications?

alistarabenzoar
Explorer

Hello,

We have a processing chain formed from 2 applications (an example is given in the snippets below).
Basically, an application creates a file and passes it to another application that performs some operations on this file. The first application waits for the second to finish.

What we would like to know if is possible to create transactions for each processing of each file by the two systems and how?

For example, we would like to be able to create a transaction that starts from the event:
"2014-08-27T09:44:33 created file with ID1"
and ends with :
"2014-08-27T09:44:47 ID1 status - completed"
but contains also the events from the second application that are related to ID1 (i.e. the events containing OD1).

Note that the two IDs chaines are interleaved so trying to use startswith and endswith did not helped us too much.
Also, we cannot use harcoded values for the file names (i.e. the IDs) as these vary. In our particular case we should have 2 transactions (one for ID1 and containing also the events for OD1 and the second for ID2 and containing also the events for OD2).
The sample logs for the two applications are:
App1

2014-08-27T09:44:33 created file with ID1
2014-08-27T09:44:34 file ID1 handed off to App2
2014-08-27T09:44:35 polling App2 for ID1
2014-08-27T09:44:36 created file with ID2 
2014-08-27T09:44:37 polling App2 for ID1
2014-08-27T09:44:38 ID1 status - waiting
2014-08-27T09:44:38 file ID2 handed off to App2
2014-08-27T09:44:39 polling App2 for ID2
2014-08-27T09:44:40 ID2 status - waiting
2014-08-27T09:44:41 polling App2 for ID1
2014-08-27T09:44:42 polling App2 for ID2
2014-08-27T09:44:43 ID2 status - waiting
2014-08-27T09:44:44 ID1 status – waiting
2014-08-27T09:44:45 polling App2 for ID1
2014-08-27T09:44:46 polling App2 for ID2
2014-08-27T09:44:47 ID1 status - completed
2014-08-27T09:44:48 ID2 status – waiting
2014-08-27T09:44:49 polling App2 for ID2
2014-08-27T09:44:50 ID2 status – waiting
2014-08-27T09:44:51 polling App2 for ID2
2014-08-27T09:44:52 ID2 status – completed

App2:

2014-08-27T09:44:33 checking input queue – nothing there 
2014-08-27T09:44:35 checking input queue – 1 file found ID1
2014-08-27T09:44:35 checking input queue – nothing there 
2014-08-27T09:44:36 mapping input file ID1 to order OD1
2014-08-27T09:44:37 checking input queue – nothing there 
2014-08-27T09:44:38 processing stage 1 OD1
2014-08-27T09:44:38 checking input queue – 1 file found ID2
2014-08-27T09:44:38 mapping input file ID2  to order OD2
2014-08-27T09:44:39 checking input queue – nothing there 
2014-08-27T09:44:39 processing stage 1 OD1
2014-08-27T09:44:39 processing stage 2 OD1
2014-08-27T09:44:39 processing stage 1 OD2
2014-08-27T09:44:40 checking input queue – nothing there 
2014-08-27T09:44:41 processing stage 3 OD1
2014-08-27T09:44:42 processing stage 2 OD2
2014-08-27T09:44:43 checking input queue – nothing there 
2014-08-27T09:44:44 processing stage 3 OD2
2014-08-27T09:44:45 processing stage 4 OD2
2014-08-27T09:44:46 checking input queue – nothing there 
2014-08-27T09:44:47 completed OD1 – status OK
2014-08-27T09:44:52 completed OD2 – status OK
2014-08-27T09:44:55 checking input queue – nothing there 

Thank you very much.

1 Solution

aweitzman
Motivator

While the most efficient answer to this can be found here, it is possible to do this with an inline search using a join:

(source APP1 OR source=APP2) 
| ...rex to extract file_id...
| ...rex to extract orderId...
| join orderId 
    [
      search source=APP2 mapping 
      | ...rex to extract file_id...
      | ...rex to extract orderId...
      | table orderId file_id
    ]
| transaction file_id

The subsearch is intended to pull out just those lines that describe the relationship between file_ids and orderIds. Since those only occur in lines in APP2 with the word mapping in them, we restrict the search that way.

Joining that subsearch to the main search on the orderId field has the effect of creating the appropriate file_id field on lines that only have orderIds on them.

Having done that, we can bundle all the events relevant to a particular file_id using the transaction file_id command.

View solution in original post

aweitzman
Motivator

While the most efficient answer to this can be found here, it is possible to do this with an inline search using a join:

(source APP1 OR source=APP2) 
| ...rex to extract file_id...
| ...rex to extract orderId...
| join orderId 
    [
      search source=APP2 mapping 
      | ...rex to extract file_id...
      | ...rex to extract orderId...
      | table orderId file_id
    ]
| transaction file_id

The subsearch is intended to pull out just those lines that describe the relationship between file_ids and orderIds. Since those only occur in lines in APP2 with the word mapping in them, we restrict the search that way.

Joining that subsearch to the main search on the orderId field has the effect of creating the appropriate file_id field on lines that only have orderIds on them.

Having done that, we can bundle all the events relevant to a particular file_id using the transaction file_id command.

aweitzman
Motivator

There's a few things you will have to do to make this work:

  1. You'll need to write field extractions so that you can consistently extract the filenames and order IDs to their own fields. Read this for details: http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/Createandmaintainsearch-timefieldextract...
  2. You'll need to maintain a CSV file representing a lookup between order IDs and filenames based on that "mapping" event. Read this for details: http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/Addfieldsfromexternaldatasources#Use_sea...
  3. Lastly, make the lookup automatic by following the first three steps here: http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/Addfieldsfromexternaldatasources
  4. Once you've done that, your search needs to include contents from both logs, and you can just key the transaction off the filename field, like: (source=APP1 OR source=APP2) | transaction filename Because of the automatic lookup you created, the filename field will exist in the events that just have an order ID in them, which will allow the transaction command to work the way you want it to.

Good luck.

alistarabenzoar
Explorer

Thank you aweitzman. Are there also other solution that can be used to accomplish this without using lookups but only search queries and inline field extractions? We are newbie in Splunk and we are trying to analyse all the possibilities.

0 Karma

aweitzman
Motivator

You could in theory use a join in order to get that data inline, but it will be less efficient because you'll be searching twice, once to get the filename-orderID relationship, and the second time to gather the rest of the data:

(source APP1 OR source=APP2) 
| ...rexes to do field extractions here... 
| join orderId [search source=APP2 mapping | ...rexes to do field extractions here... | table orderId filename]
| transaction filename

My first suggestion is more efficient and a lot cleaner. But it's not impossible to do it this way.

0 Karma

alistarabenzoar
Explorer

Thanks again aweitzman. I think both solutions have the problem that the events that contain only OD1 or OD2 are not included in the transactions .
Trying something like the snippet below seems to include also the events containing only orderId ODx (like "completed OD1 – status OK" that should be included in the transaction for file ID1 as this event is related to this file id by its order ID) and as this is the goal (Edited for the final version):

 (sourcetype="APP1" OR sourcetype="APP2") 
| rex "(with|for|found|file|\d+) (?<file_id>ID\d+)"
| rex "(stage\s\d+|completed|order) (?<orderId>OD\d+).?"
| join orderId 
     [search sourcetype="APP2"
        | rex "(mapping\sinput\sfile (?<file_id>ID\d+) (.*order) (?<orderId>OD\d+)"
        | table orderId file_id
     ]

| transaction file_id

0 Karma

aweitzman
Motivator

(It's hard to read your snippet because all of the backslashes and <> characters are stripped out. Try surrounding your code snippet with backquotes, or highlighting it and clicking the "code sample" button.)

The point of each solution is to make it so that the order IDs are attached to the file IDs. Based on what you posted here, it does not appear you are extracting the file ID in the "joined" subsearch, so there's no correlation between order IDs and file IDs. You need that in order to make this work.

Also, you are not extracting the order ID in the main search either. You'll need that too.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...