Splunk Search

Join with conditions

arusoft
Communicator

I have two tables

EmailXDocDateCheckedName
a@a.comDoc 11/1/2021a
a@a.comDoc 21/15/2021a
a@a.comDoc 31/30/2021b

 

EmailYDateLogin
a@a.com12/10/2022
a@a.com11/10/2022
a@a.com1/15/2021
a@a.com1/25/2021

 

I want to join them on Emailx & EmailY and then in result  for each Email  i need to get most recent DateLogin that is before DateChecked.  I am hoping to no have to use joins as my second table has more than 50k records. 

So the results should be like this

EmailXDocDateCheckedNameRecentDateLogin
a@a.comDoc 11/1/2021a-
a@a.comDoc 21/15/2021a1/15/2021
a@a.comDoc 31/30/2021b1/25/2021

 


So if I have to write a sql, it would be something like below. I haven't tested below , but you get the idea.

SELECT t1.EmailX, t1.Doc, t1.DateChecked, t1.Name, max(t2.DateLogin) as RecentDateLogin 
FROM table1 AS t1
LEFT JOIN table 2 AS t2
ON t1.EmailX = t2.EmailY
AND t1.DateChecked>t2.DateLogin
Group By t1.EmailX, t1.Doc, t1.DateChecked, t1.Name

 

Thanks.

 

Labels (2)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Here is an example using your data. You can paste this in to a search window

| makeresults
| eval _raw="EmailX	Doc	DateChecked	Name
a@a.com	Doc 1	1/1/2021	a
a@a.com	Doc 2	1/15/2021	a
a@a.com	Doc 3	1/30/2021	b
"
| multikv forceheader=1 
| append [
  | makeresults
  | eval _raw="
EmailY	DateLogin
a@a.com	12/10/2022
a@a.com	11/10/2022
a@a.com	1/15/2021
a@a.com	1/25/2021
"
  | multikv forceheader=1 
]
| table Email* Doc Date* Name
| eval dl=strptime(DateLogin, "%m/%d/%Y")
| eval Email=coalesce(EmailX, EmailY)
| eventstats values(dl) as dl by Email
| stats values(*) as * by Email Doc DateChecked Name
| eval dc=strptime(DateChecked, "%m/%d/%Y")
| eval login=0
| eval login=max(mvmap(dl, if(login=0 AND dl<=dc,dl,0)))
| eval ClosestLogin=if(login>0, strftime(login, "%m/%d/%Y"), "-")
| table Email Doc DateChecked Name ClosestLogin

The relevant part for you is from the table Email*... line

What you want is to search both data sets and then

1. Make a common Email field from either of the X or Y variants
2. Collect all login dates for that email (eventstats)
3. Collapse all data for each email/doc/name/check date
4. Find the closest login to the checked date (eval statements)
5. Produce the table

Hope this helps.

 

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

Here is an example using your data. You can paste this in to a search window

| makeresults
| eval _raw="EmailX	Doc	DateChecked	Name
a@a.com	Doc 1	1/1/2021	a
a@a.com	Doc 2	1/15/2021	a
a@a.com	Doc 3	1/30/2021	b
"
| multikv forceheader=1 
| append [
  | makeresults
  | eval _raw="
EmailY	DateLogin
a@a.com	12/10/2022
a@a.com	11/10/2022
a@a.com	1/15/2021
a@a.com	1/25/2021
"
  | multikv forceheader=1 
]
| table Email* Doc Date* Name
| eval dl=strptime(DateLogin, "%m/%d/%Y")
| eval Email=coalesce(EmailX, EmailY)
| eventstats values(dl) as dl by Email
| stats values(*) as * by Email Doc DateChecked Name
| eval dc=strptime(DateChecked, "%m/%d/%Y")
| eval login=0
| eval login=max(mvmap(dl, if(login=0 AND dl<=dc,dl,0)))
| eval ClosestLogin=if(login>0, strftime(login, "%m/%d/%Y"), "-")
| table Email Doc DateChecked Name ClosestLogin

The relevant part for you is from the table Email*... line

What you want is to search both data sets and then

1. Make a common Email field from either of the X or Y variants
2. Collect all login dates for that email (eventstats)
3. Collapse all data for each email/doc/name/check date
4. Find the closest login to the checked date (eval statements)
5. Produce the table

Hope this helps.

 

arusoft
Communicator

Thanks @bowesmana , This is super cool. My quick test looks good. I will do some more testing and if I see any issues, I will update

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@arusoft 

Note that using eventstats is not the fastest operation and if you have a clustered index environment, will cause the data to be pushed to the search head for the operation to run there.

I couldn't immediately think of a way to avoid that, that would avoid some more complex logic involving creating complex fields and using mvexpand after a normal stats. Anyway, test as needed.

One thing you should do before using eventstats is to use the fields command to minimise the data that will end up going to the search head. In the above, you would do

| fields - EmailX EmailY DateLogin

before the eventstats.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...