Splunk Search

Join with conditions

arusoft
Communicator

I have two tables

EmailXDocDateCheckedName
a@a.comDoc 11/1/2021a
a@a.comDoc 21/15/2021a
a@a.comDoc 31/30/2021b

 

EmailYDateLogin
a@a.com12/10/2022
a@a.com11/10/2022
a@a.com1/15/2021
a@a.com1/25/2021

 

I want to join them on Emailx & EmailY and then in result  for each Email  i need to get most recent DateLogin that is before DateChecked.  I am hoping to no have to use joins as my second table has more than 50k records. 

So the results should be like this

EmailXDocDateCheckedNameRecentDateLogin
a@a.comDoc 11/1/2021a-
a@a.comDoc 21/15/2021a1/15/2021
a@a.comDoc 31/30/2021b1/25/2021

 


So if I have to write a sql, it would be something like below. I haven't tested below , but you get the idea.

SELECT t1.EmailX, t1.Doc, t1.DateChecked, t1.Name, max(t2.DateLogin) as RecentDateLogin 
FROM table1 AS t1
LEFT JOIN table 2 AS t2
ON t1.EmailX = t2.EmailY
AND t1.DateChecked>t2.DateLogin
Group By t1.EmailX, t1.Doc, t1.DateChecked, t1.Name

 

Thanks.

 

Labels (2)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Here is an example using your data. You can paste this in to a search window

| makeresults
| eval _raw="EmailX	Doc	DateChecked	Name
a@a.com	Doc 1	1/1/2021	a
a@a.com	Doc 2	1/15/2021	a
a@a.com	Doc 3	1/30/2021	b
"
| multikv forceheader=1 
| append [
  | makeresults
  | eval _raw="
EmailY	DateLogin
a@a.com	12/10/2022
a@a.com	11/10/2022
a@a.com	1/15/2021
a@a.com	1/25/2021
"
  | multikv forceheader=1 
]
| table Email* Doc Date* Name
| eval dl=strptime(DateLogin, "%m/%d/%Y")
| eval Email=coalesce(EmailX, EmailY)
| eventstats values(dl) as dl by Email
| stats values(*) as * by Email Doc DateChecked Name
| eval dc=strptime(DateChecked, "%m/%d/%Y")
| eval login=0
| eval login=max(mvmap(dl, if(login=0 AND dl<=dc,dl,0)))
| eval ClosestLogin=if(login>0, strftime(login, "%m/%d/%Y"), "-")
| table Email Doc DateChecked Name ClosestLogin

The relevant part for you is from the table Email*... line

What you want is to search both data sets and then

1. Make a common Email field from either of the X or Y variants
2. Collect all login dates for that email (eventstats)
3. Collapse all data for each email/doc/name/check date
4. Find the closest login to the checked date (eval statements)
5. Produce the table

Hope this helps.

 

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

Here is an example using your data. You can paste this in to a search window

| makeresults
| eval _raw="EmailX	Doc	DateChecked	Name
a@a.com	Doc 1	1/1/2021	a
a@a.com	Doc 2	1/15/2021	a
a@a.com	Doc 3	1/30/2021	b
"
| multikv forceheader=1 
| append [
  | makeresults
  | eval _raw="
EmailY	DateLogin
a@a.com	12/10/2022
a@a.com	11/10/2022
a@a.com	1/15/2021
a@a.com	1/25/2021
"
  | multikv forceheader=1 
]
| table Email* Doc Date* Name
| eval dl=strptime(DateLogin, "%m/%d/%Y")
| eval Email=coalesce(EmailX, EmailY)
| eventstats values(dl) as dl by Email
| stats values(*) as * by Email Doc DateChecked Name
| eval dc=strptime(DateChecked, "%m/%d/%Y")
| eval login=0
| eval login=max(mvmap(dl, if(login=0 AND dl<=dc,dl,0)))
| eval ClosestLogin=if(login>0, strftime(login, "%m/%d/%Y"), "-")
| table Email Doc DateChecked Name ClosestLogin

The relevant part for you is from the table Email*... line

What you want is to search both data sets and then

1. Make a common Email field from either of the X or Y variants
2. Collect all login dates for that email (eventstats)
3. Collapse all data for each email/doc/name/check date
4. Find the closest login to the checked date (eval statements)
5. Produce the table

Hope this helps.

 

arusoft
Communicator

Thanks @bowesmana , This is super cool. My quick test looks good. I will do some more testing and if I see any issues, I will update

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@arusoft 

Note that using eventstats is not the fastest operation and if you have a clustered index environment, will cause the data to be pushed to the search head for the operation to run there.

I couldn't immediately think of a way to avoid that, that would avoid some more complex logic involving creating complex fields and using mvexpand after a normal stats. Anyway, test as needed.

One thing you should do before using eventstats is to use the fields command to minimise the data that will end up going to the search head. In the above, you would do

| fields - EmailX EmailY DateLogin

before the eventstats.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...