Splunk Search

how to turn an inner join into sub-search and pass non matching values

rsennett_splunk
Splunk Employee
Splunk Employee

sourcetype=Account contains Id values and the AccountName

sourcetype=Issue contains AccountId values but no AccountName and this is really the sourcetype that contains the stuff I want to report on...

If I do an inner join:

Filter sourcetype Account then

| eval AccountId = Id
| fields Name, AccountId
| join type=inner AccountId [search sourcetype=Issue |fields AccountId Name Issue

This search has completed and has returned 4 results by scanning 481 events in 49.392 seconds.

If I use a sub-search

Sourcetype=Issue
[sourcetype=Account ...filter-the-AccountId...|fields + AccountId]
table AccountId Issue

This search has completed and has returned 4 results by scanning 686 events in 6.182 seconds.

What I want is

table Name Issue...

But I've tried a few things and I can't figure out how to pass the Account Name, or where I'd do a lookup for it so I could get the fields from the Account and the Issue together without doing the inner join...

How would I structure this so it's more efficient than the inner join? I know there must be a better way...
Right now I'm keeping the timeframe small, but I'd like to expand both the filters on the Accounts and the timeframe.
If I have to use the inner join, I'm afraid I'm really just approaching this wrong.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
Tags (1)

somesoni2
Revered Legend

Try the following (just in case creating lookup is not an option for you):-

sourcetype=Account ...other filters...| stats count by Id, Accountname
| rename Id as AccountId | fields - count
| join AccountId [search sourcetype=Issue |fields AccountId Name Issue]

if there could be more that one issue for an AccountId, then use "| join max=0" instead of "|join".

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

thank you... yes. the lookup is indeed an issue... and the join works fine, it's just not efficient. the join is where I started. (I did try different types of joins)

That might just be where I have to be. I was hoping there was some way I was missing... where once I passed the Id/AccountId I would have a way to pass other fields as well even though they were cosmetic and had no match in the 2nd sourcetype.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

dwaddle
SplunkTrust
SplunkTrust

The "most splunkish" (splunkonic? splunkadelic?) way of doing this is with an actual lookup file. You would do a scheduled search of the form:

sourcetype=Account 
| fields Id, Accountname 
| inputlookup append=t accounts.csv
| stats first(Accountname) as Accountname by Id
| outputlookup accounts.csv

This would (incrementally) maintain the lookup file in the fashion araitz demonstrates in hist blog post.

Then your real search can use a simple lookup command (or an implicitly defined lookup in the configuration files) to bring in the Accountname data.


Another approach that might work is to use a "disjointed search" along with stats as a row-compression function. I've not tested this, but a search similar to:

sourcetype=Account OR sourcetype=issue
| stats first(Name) as Name, first(Issue) as Issue by Id

Should get fairly close to what you're trying to do.

rsennett_splunk
Splunk Employee
Splunk Employee

correct. these are extracts from relational tables... which is why this is a bit of a mess... as for renaming... yes, it would be. I was following your suggestion exactly which left out the eval. maybe I'm missing the insight you're trying to pass along... it's actually the coalesce I hadn't thought of... which might now turn out to be a help. 🙂

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

dwaddle
SplunkTrust
SplunkTrust

Isn't the naming deal solved with a simple | eval Id=coalesce(Id,AccountId) or a | rename AccountId AS ID ?

Also, Splunk is designed to handle extremely large lookups fairly efficiently. If your list of (Id, Accountname) tuples in sourcetype=Account is < 1GB a lookup is still applicable.

I get a somewhat funny feeling that your Account sourcetype is coming from a relational source of sorts anyway - so why not make a lookup file directly from there?

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

I appreciate the blog link. A lot of good info there. the accumulated account list isn't practical in this case, as the list is massive and the search is taking only a small slice (and that slice changes based on daily assignments so the pull needs to be dynamic)
The second suggestion doesn't account for the fact that Id = AccountId which is why I had to join... I believe the 2nd suggestion is actually where I started... and the inner join is where I ended up. - hoping for more insight.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!