I am trying to join two search results with the common field project.
Here is an example:
First result would return for Phase-I
project sub-project processed_timestamp
p1 sp11 5/12/13 2:10:45.344 PM
p1 sp12 5/13/13 12:11:45.344 PM
p1 sp13 5/13/13 2:10:45.344 PM
p2 sp21 6/23/13 12:10:45.344 PM
p2 sp22 6/24/13 12:10:45.344 PM
p3 sp31 7/23/13 12:10:45.344 PM
p3 sp32 7/24/13 12:10:45.344 PM
p4 sp41 7/23/13 12:10:45.344 PM
Second result would return for Phase-II
project sub-project processed_timestamp
p1 sp11 6/12/13 2:10:45.344 PM
p1 sp12 6/13/13 12:11:45.344 PM
p2 sp21 7/23/13 12:10:45.344 PM
p2 sp22 7/24/13 12:10:45.344 PM
Here is the output I am looking for
project phaseI_start phaseI_end phaseII_start phaseII_end
p1 5/12/13 2:10:45.344 PM 5/13/13 2:10:45.344 PM 6/12/13 2:10:45.344 PM 6/13/13 12:11:45.344 PM
p2 6/23/13 12:10:45.344 PM 6/24/13 12:10:45.344 PM 7/23/13 12:10:45.344 PM 7/24/13 12:10:45.344 PM
p3 7/23/13 12:10:45.344 PM 7/24/13 12:10:45.344 PM
p4 7/23/13 12:10:45.344 PM 7/23/13 12:10:45.344 PM (has only one sub project so sametime applies to start and end)
I tried using transaction for each searches separately and used join as follows:
sourcetype="A"| transaction project |eval phaseI_start= ...|eval phaseI_end = .....| fields project, phaseI_start, phaseI_end| join project [search sourcetype="B"| transaction project |eval phaseII_start= ...|eval phaseII_end = .....| fields project, phaseII_start, phaseII_end]
I donot get any result back. However if I apply filter to get specific projects using "where like (project,"P1%") in both searches then it works. First search would return more than 10000 records and second would return about 5000.
Thanks,
Sanjay
I would try to do this without a join if possible.
sourcetype="A" OR sourcetype="B"
| stats last(processed_timestamp) as start first(processed_timestamp) as end by project sourcetype
| eval phaseI_start=case(sourcetype=="A", start)
| eval phaseI_end=case(sourcetype=="A", end)
| eval phaseII_start=case(sourcetype=="B", start)
| eval phaseII_end=case(sourcetype=="B", end)
| stats first(phaseI_start) as phaseI_start first(phaseI_end) as phaseI_end first(phaseII_start) as phaseII_start first(phaseII_end) as phaseII_end by project
Okay, thats a lot of pipes. what the heck is going on?
sourcetype="A" OR sourcetype="B"
gets all our events.
...| stats last(processed_timestamp) as start first(processed_timestamp) as end by project sourcetype
gives us a table of the following:
project | sourcetype | start | end
p1 A 5/12/13 2:10:45.344 PM 5/13/13 2:10:45.344 PM
p1 B 6/12/13 2:10:45.344 PM 6/13/13 12:11:45.344 PM
etc
Now lets get those values into the desired fields for our results.
...| eval phaseI_start=case(sourcetype=="A", start)
This says create a field called phaseI_start and when the sourcetype=="A" fill it with the start value, otherwise it's null. Repeat for each field. Now we have our phaseI_start, phaseI_end, etc fields populated. All that's left is to get them into tabular format. Stats can do this too.
...| stats first(phaseI_start) as phaseI_start first(phaseI_end) as phaseI_end first(phaseII_start) as phaseII_start first(phaseII_end) as phaseII_end by project
Notes on the first() last() stuff. First() refers to the first value seen in the search. Since searches run from the most recent and go back in time, the first() value seen is the most recent value, in this case our end timestamps. conversely, last() refers to the last value seen, which is the oldest value, and would refer to our start timestamps.
End result is what you're looking for without expensive joins or transactions!
I would try to do this without a join if possible.
sourcetype="A" OR sourcetype="B"
| stats last(processed_timestamp) as start first(processed_timestamp) as end by project sourcetype
| eval phaseI_start=case(sourcetype=="A", start)
| eval phaseI_end=case(sourcetype=="A", end)
| eval phaseII_start=case(sourcetype=="B", start)
| eval phaseII_end=case(sourcetype=="B", end)
| stats first(phaseI_start) as phaseI_start first(phaseI_end) as phaseI_end first(phaseII_start) as phaseII_start first(phaseII_end) as phaseII_end by project
Okay, thats a lot of pipes. what the heck is going on?
sourcetype="A" OR sourcetype="B"
gets all our events.
...| stats last(processed_timestamp) as start first(processed_timestamp) as end by project sourcetype
gives us a table of the following:
project | sourcetype | start | end
p1 A 5/12/13 2:10:45.344 PM 5/13/13 2:10:45.344 PM
p1 B 6/12/13 2:10:45.344 PM 6/13/13 12:11:45.344 PM
etc
Now lets get those values into the desired fields for our results.
...| eval phaseI_start=case(sourcetype=="A", start)
This says create a field called phaseI_start and when the sourcetype=="A" fill it with the start value, otherwise it's null. Repeat for each field. Now we have our phaseI_start, phaseI_end, etc fields populated. All that's left is to get them into tabular format. Stats can do this too.
...| stats first(phaseI_start) as phaseI_start first(phaseI_end) as phaseI_end first(phaseII_start) as phaseII_start first(phaseII_end) as phaseII_end by project
Notes on the first() last() stuff. First() refers to the first value seen in the search. Since searches run from the most recent and go back in time, the first() value seen is the most recent value, in this case our end timestamps. conversely, last() refers to the last value seen, which is the oldest value, and would refer to our start timestamps.
End result is what you're looking for without expensive joins or transactions!
What would be best approach if these two searches results should joined by project? That means the output would be only from project p1 and p2.
project phaseI_start phaseI_end phaseII_start phaseII_end
p1 5/12/13 2:10:45.344 PM 5/13/13 2:10:45.344 PM 6/12/13 2:10:45.344 PM 6/13/13 12:11:45.344 PM
p2 6/23/13 12:10:45.344 PM 6/24/13 12:10:45.344 PM 7/23/13 12:10:45.344 PM 7/24/13 12:10:45.344 PM
Thanks for your an elegant solution. This way I do not need to rename the field.
I was able to get the result using STATS. I had to rename field to processed_timestamp2 for phase II result. Also used eval to format date as follow.
sourcetype="A" OR sourcetype="B"| STATS min(processed_timestamp1), max(processed_timestamp1),min(processed_timestamp2), max(processed_timestamp2) by project |eval phaseI_start= ...|eval phaseI_end = .... |eval phaseII_start= ...|eval phaseII_end = .....| table project, phaseI_start, phaseI_end, phaseII_start, phaseII_end
Cheers!!!