This is the query:
source=Audit earliest=-2d
[search source=Audit | stats count by persistent_id | where count > 20 and isnotnull(persistent_id) | fields persistent_id]
| JOIN max=0 Audit_Id
[ SEARCH source= Applicant ]
| table Audit_Id , FirstName, LastName
It first selects any rows from the audit table that have a not null persistent_id that occurs in the table more than 20 times. Then if any rows that have that persistent_id have turned up in the last 2 days it joins them to the Applicant table and returns a table result with the audit id and the names
The problem is that the join only returns the first match even though the max=0 setting is set.
I am trying to translate this sql query:
SELECT Audit_Id, FirstName, LastName
FROM Audit JOIN Applicant ON Audit_Id
WHERE persistent_id IN
(
SELECT persistent_id from Audit group by persistent_id having count(*)>20 and persistent_id is not null
)
and timestamp > DATEADD(hh, -48, GETDATE());
The sql returns
Audit_Id FirstName LastName
414 joe strummer
422 johny rotten
422 sid vicious
Whereas the splunk search returns
Audit_Id FirstName LastName
414 joe strummer
422 johny rotten
Joins are slow and usually not necessary. Try this query instead.
(source=Audit earliest=-2d
[ search source=Audit
| stats count by persistent_id
| where count > 20 and isnotnull(persistent_id)
| fields persistent_id] ) OR
source= Applicant
| stats values(FirstName) values(LastName) by Audit_Id
Hi
the first bit of 2 day search (lines 1-5) returns 4 rows from the Audit table and really what I want to do is add the columns (firstname and lastname) from the other table (Applicant) to those 4 rows using the column Audit_Id which is common to both tables.
The change suggested above using OR gives me a result of 113000 rows
Thanks for the clarification. I've modified my answer. If that doesn't give the results you want, we can try using another subsearch.
Thanks rich, I appreciate the help, but unfortunately the result still goes from 4 rows to 113,000 rows
OK. Here's a different search to try. Completely untested. The idea is to use each nested search to build the main search into something akin to source=Applicant (FirstName=<fname> LastName=<lname>...)
. If this goes nowhere then it's back to looking at why JOIN isn't showing all results.
source=Applicant [source=Audit earliest=-2d
[ search source=Audit
| stats count by persistent_id
| where count > 20 and isnotnull(persistent_id)
| fields persistent_id]
| fields FirstName LastName]
| table Audit_Id FirstName LastName