I have two tables
EmailX | Doc | DateChecked | Name |
a@a.com | Doc 1 | 1/1/2021 | a |
a@a.com | Doc 2 | 1/15/2021 | a |
a@a.com | Doc 3 | 1/30/2021 | b |
EmailY | DateLogin |
a@a.com | 12/10/2022 |
a@a.com | 11/10/2022 |
a@a.com | 1/15/2021 |
a@a.com | 1/25/2021 |
I want to join them on Emailx & EmailY and then in result for each Email i need to get most recent DateLogin that is before DateChecked. I am hoping to no have to use joins as my second table has more than 50k records.
So the results should be like this
EmailX | Doc | DateChecked | Name | RecentDateLogin |
a@a.com | Doc 1 | 1/1/2021 | a | - |
a@a.com | Doc 2 | 1/15/2021 | a | 1/15/2021 |
a@a.com | Doc 3 | 1/30/2021 | b | 1/25/2021 |
So if I have to write a sql, it would be something like below. I haven't tested below , but you get the idea.
SELECT t1.EmailX, t1.Doc, t1.DateChecked, t1.Name, max(t2.DateLogin) as RecentDateLogin
FROM table1 AS t1
LEFT JOIN table 2 AS t2
ON t1.EmailX = t2.EmailY
AND t1.DateChecked>t2.DateLogin
Group By t1.EmailX, t1.Doc, t1.DateChecked, t1.Name
Thanks.
Here is an example using your data. You can paste this in to a search window
| makeresults
| eval _raw="EmailX Doc DateChecked Name
a@a.com Doc 1 1/1/2021 a
a@a.com Doc 2 1/15/2021 a
a@a.com Doc 3 1/30/2021 b
"
| multikv forceheader=1
| append [
| makeresults
| eval _raw="
EmailY DateLogin
a@a.com 12/10/2022
a@a.com 11/10/2022
a@a.com 1/15/2021
a@a.com 1/25/2021
"
| multikv forceheader=1
]
| table Email* Doc Date* Name
| eval dl=strptime(DateLogin, "%m/%d/%Y")
| eval Email=coalesce(EmailX, EmailY)
| eventstats values(dl) as dl by Email
| stats values(*) as * by Email Doc DateChecked Name
| eval dc=strptime(DateChecked, "%m/%d/%Y")
| eval login=0
| eval login=max(mvmap(dl, if(login=0 AND dl<=dc,dl,0)))
| eval ClosestLogin=if(login>0, strftime(login, "%m/%d/%Y"), "-")
| table Email Doc DateChecked Name ClosestLogin
The relevant part for you is from the table Email*... line
What you want is to search both data sets and then
1. Make a common Email field from either of the X or Y variants
2. Collect all login dates for that email (eventstats)
3. Collapse all data for each email/doc/name/check date
4. Find the closest login to the checked date (eval statements)
5. Produce the table
Hope this helps.
Here is an example using your data. You can paste this in to a search window
| makeresults
| eval _raw="EmailX Doc DateChecked Name
a@a.com Doc 1 1/1/2021 a
a@a.com Doc 2 1/15/2021 a
a@a.com Doc 3 1/30/2021 b
"
| multikv forceheader=1
| append [
| makeresults
| eval _raw="
EmailY DateLogin
a@a.com 12/10/2022
a@a.com 11/10/2022
a@a.com 1/15/2021
a@a.com 1/25/2021
"
| multikv forceheader=1
]
| table Email* Doc Date* Name
| eval dl=strptime(DateLogin, "%m/%d/%Y")
| eval Email=coalesce(EmailX, EmailY)
| eventstats values(dl) as dl by Email
| stats values(*) as * by Email Doc DateChecked Name
| eval dc=strptime(DateChecked, "%m/%d/%Y")
| eval login=0
| eval login=max(mvmap(dl, if(login=0 AND dl<=dc,dl,0)))
| eval ClosestLogin=if(login>0, strftime(login, "%m/%d/%Y"), "-")
| table Email Doc DateChecked Name ClosestLogin
The relevant part for you is from the table Email*... line
What you want is to search both data sets and then
1. Make a common Email field from either of the X or Y variants
2. Collect all login dates for that email (eventstats)
3. Collapse all data for each email/doc/name/check date
4. Find the closest login to the checked date (eval statements)
5. Produce the table
Hope this helps.
Thanks @bowesmana , This is super cool. My quick test looks good. I will do some more testing and if I see any issues, I will update
Note that using eventstats is not the fastest operation and if you have a clustered index environment, will cause the data to be pushed to the search head for the operation to run there.
I couldn't immediately think of a way to avoid that, that would avoid some more complex logic involving creating complex fields and using mvexpand after a normal stats. Anyway, test as needed.
One thing you should do before using eventstats is to use the fields command to minimise the data that will end up going to the search head. In the above, you would do
| fields - EmailX EmailY DateLogin
before the eventstats.