Solved: Join two search results

sanjay_shrestha · ‎07-23-2013

I am trying to join two search results with the common field project.

Here is an example:

First result would return for Phase-I


project      sub-project         processed_timestamp
p1              sp11            5/12/13  2:10:45.344 PM
p1              sp12            5/13/13 12:11:45.344 PM
p1              sp13            5/13/13  2:10:45.344 PM

p2              sp21            6/23/13 12:10:45.344 PM
p2              sp22            6/24/13 12:10:45.344 PM

p3              sp31            7/23/13 12:10:45.344 PM
p3              sp32            7/24/13 12:10:45.344 PM

p4              sp41            7/23/13 12:10:45.344 PM

Second result would return for Phase-II



project      sub-project         processed_timestamp

p1              sp11            6/12/13  2:10:45.344 PM

p1              sp12            6/13/13 12:11:45.344 PM

p2 sp21 7/23/13 12:10:45.344 PM
p2 sp22 7/24/13 12:10:45.344 PM

Here is the output I am looking for




project         phaseI_start                    phaseI_end              phaseII_start                   phaseII_end
p1      5/12/13  2:10:45.344 PM     5/13/13  2:10:45.344 PM     6/12/13  2:10:45.344 PM     6/13/13 12:11:45.344 PM
p2      6/23/13 12:10:45.344 PM     6/24/13 12:10:45.344 PM     7/23/13 12:10:45.344 PM     7/24/13 12:10:45.344 PM
p3      7/23/13 12:10:45.344 PM     7/24/13 12:10:45.344 PM
p4      7/23/13 12:10:45.344 PM     7/23/13 12:10:45.344 PM (has only one sub project so sametime applies to start and end)

I tried using transaction for each searches separately and used join as follows:


sourcetype="A"| transaction project |eval phaseI_start= ...|eval phaseI_end = .....| fields project, phaseI_start, phaseI_end| join project [search sourcetype="B"| transaction project |eval phaseII_start= ...|eval phaseII_end = .....| fields project, phaseII_start, phaseII_end]

I donot get any result back. However if I apply filter to get specific projects using "where like (project,"P1%") in both searches then it works. First search would return more than 10000 records and second would return about 5000.

Thanks,

Sanjay

emiller42 · ‎07-23-2013

I would try to do this without a join if possible.

sourcetype="A" OR sourcetype="B" 
| stats last(processed_timestamp) as start first(processed_timestamp) as end by project sourcetype 
| eval phaseI_start=case(sourcetype=="A", start) 
| eval phaseI_end=case(sourcetype=="A", end) 
| eval phaseII_start=case(sourcetype=="B", start) 
| eval phaseII_end=case(sourcetype=="B", end) 
| stats first(phaseI_start) as phaseI_start first(phaseI_end) as phaseI_end first(phaseII_start) as phaseII_start first(phaseII_end) as phaseII_end by project

Okay, thats a lot of pipes. what the heck is going on?

sourcetype="A" OR sourcetype="B"

gets all our events.

...| stats last(processed_timestamp) as start first(processed_timestamp) as end by project sourcetype

gives us a table of the following:

project | sourcetype | start                   | end
p1        A            5/12/13  2:10:45.344 PM   5/13/13  2:10:45.344 PM
p1        B            6/12/13  2:10:45.344 PM   6/13/13 12:11:45.344 PM
etc

Now lets get those values into the desired fields for our results.

...| eval phaseI_start=case(sourcetype=="A", start)

This says create a field called phaseI_start and when the sourcetype=="A" fill it with the start value, otherwise it's null. Repeat for each field. Now we have our phaseI_start, phaseI_end, etc fields populated. All that's left is to get them into tabular format. Stats can do this too.

...| stats first(phaseI_start) as phaseI_start first(phaseI_end) as phaseI_end first(phaseII_start) as phaseII_start first(phaseII_end) as phaseII_end by project

Notes on the first() last() stuff. First() refers to the first value seen in the search. Since searches run from the most recent and go back in time, the first() value seen is the most recent value, in this case our end timestamps. conversely, last() refers to the last value seen, which is the oldest value, and would refer to our start timestamps.

End result is what you're looking for without expensive joins or transactions!

View solution in original post

emiller42 · ‎07-23-2013

I would try to do this without a join if possible.

sourcetype="A" OR sourcetype="B" 
| stats last(processed_timestamp) as start first(processed_timestamp) as end by project sourcetype 
| eval phaseI_start=case(sourcetype=="A", start) 
| eval phaseI_end=case(sourcetype=="A", end) 
| eval phaseII_start=case(sourcetype=="B", start) 
| eval phaseII_end=case(sourcetype=="B", end) 
| stats first(phaseI_start) as phaseI_start first(phaseI_end) as phaseI_end first(phaseII_start) as phaseII_start first(phaseII_end) as phaseII_end by project

Okay, thats a lot of pipes. what the heck is going on?

sourcetype="A" OR sourcetype="B"

gets all our events.

...| stats last(processed_timestamp) as start first(processed_timestamp) as end by project sourcetype

gives us a table of the following:

project | sourcetype | start                   | end
p1        A            5/12/13  2:10:45.344 PM   5/13/13  2:10:45.344 PM
p1        B            6/12/13  2:10:45.344 PM   6/13/13 12:11:45.344 PM
etc

Now lets get those values into the desired fields for our results.

...| eval phaseI_start=case(sourcetype=="A", start)

This says create a field called phaseI_start and when the sourcetype=="A" fill it with the start value, otherwise it's null. Repeat for each field. Now we have our phaseI_start, phaseI_end, etc fields populated. All that's left is to get them into tabular format. Stats can do this too.

...| stats first(phaseI_start) as phaseI_start first(phaseI_end) as phaseI_end first(phaseII_start) as phaseII_start first(phaseII_end) as phaseII_end by project

Notes on the first() last() stuff. First() refers to the first value seen in the search. Since searches run from the most recent and go back in time, the first() value seen is the most recent value, in this case our end timestamps. conversely, last() refers to the last value seen, which is the oldest value, and would refer to our start timestamps.

End result is what you're looking for without expensive joins or transactions!

sanjay_shrestha · ‎07-31-2013

What would be best approach if these two searches results should joined by project? That means the output would be only from project p1 and p2.


project         phaseI_start                    phaseI_end              phaseII_start                   phaseII_end
p1      5/12/13  2:10:45.344 PM     5/13/13  2:10:45.344 PM     6/12/13  2:10:45.344 PM     6/13/13 12:11:45.344 PM
p2      6/23/13 12:10:45.344 PM     6/24/13 12:10:45.344 PM     7/23/13 12:10:45.344 PM     7/24/13 12:10:45.344 PM

sanjay_shrestha · ‎07-24-2013

Thanks for your an elegant solution. This way I do not need to rename the field.

sanjay_shrestha · ‎07-23-2013

I was able to get the result using STATS. I had to rename field to processed_timestamp2 for phase II result. Also used eval to format date as follow.



sourcetype="A" OR sourcetype="B"| STATS min(processed_timestamp1), max(processed_timestamp1),min(processed_timestamp2), max(processed_timestamp2) by project |eval phaseI_start= ...|eval phaseI_end = .... |eval phaseII_start= ...|eval phaseII_end = .....| table project, phaseI_start, phaseI_end, phaseII_start, phaseII_end

Cheers!!!

Join two search results

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

Join two search results

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...