how to turn an inner join into sub-search and pass...

rsennett_splunk · ‎01-08-2014

sourcetype=Account contains Id values and the AccountName

sourcetype=Issue contains AccountId values but no AccountName and this is really the sourcetype that contains the stuff I want to report on...

If I do an inner join:

Filter sourcetype Account then
| eval AccountId = Id | fields Name, AccountId | join type=inner AccountId [search sourcetype=Issue |fields AccountId Name Issue

This search has completed and has returned 4 results by scanning 481 events in 49.392 seconds.

If I use a sub-search
Sourcetype=Issue [sourcetype=Account ...filter-the-AccountId...|fields + AccountId] table AccountId Issue

This search has completed and has returned 4 results by scanning 686 events in 6.182 seconds.

What I want is

table Name Issue...

But I've tried a few things and I can't figure out how to pass the Account Name, or where I'd do a lookup for it so I could get the fields from the Account and the Issue together without doing the inner join...

How would I structure this so it's more efficient than the inner join? I know there must be a better way...
Right now I'm keeping the timeframe small, but I'd like to expand both the filters on the Accounts and the timeframe.
If I have to use the inner join, I'm afraid I'm really just approaching this wrong.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

somesoni2 · ‎01-09-2014

Try the following (just in case creating lookup is not an option for you):-

sourcetype=Account ...other filters...| stats count by Id, Accountname
| rename Id as AccountId | fields - count
| join AccountId [search sourcetype=Issue |fields AccountId Name Issue]

if there could be more that one issue for an AccountId, then use "| join max=0" instead of "|join".

rsennett_splunk · ‎01-09-2014

thank you... yes. the lookup is indeed an issue... and the join works fine, it's just not efficient. the join is where I started. (I did try different types of joins)

That might just be where I have to be. I was hoping there was some way I was missing... where once I passed the Id/AccountId I would have a way to pass other fields as well even though they were cosmetic and had no match in the 2nd sourcetype.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

dwaddle · ‎01-08-2014

The "most splunkish" (splunkonic? splunkadelic?) way of doing this is with an actual lookup file. You would do a scheduled search of the form:

sourcetype=Account 
| fields Id, Accountname 
| inputlookup append=t accounts.csv
| stats first(Accountname) as Accountname by Id
| outputlookup accounts.csv

This would (incrementally) maintain the lookup file in the fashion araitz demonstrates in hist blog post.

Then your real search can use a simple lookup command (or an implicitly defined lookup in the configuration files) to bring in the Accountname data.

Another approach that might work is to use a "disjointed search" along with stats as a row-compression function. I've not tested this, but a search similar to:

sourcetype=Account OR sourcetype=issue
| stats first(Name) as Name, first(Issue) as Issue by Id

Should get fairly close to what you're trying to do.

rsennett_splunk · ‎01-09-2014

correct. these are extracts from relational tables... which is why this is a bit of a mess... as for renaming... yes, it would be. I was following your suggestion exactly which left out the eval. maybe I'm missing the insight you're trying to pass along... it's actually the coalesce I hadn't thought of... which might now turn out to be a help. 🙂

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

dwaddle · ‎01-09-2014

Isn't the naming deal solved with a simple | eval Id=coalesce(Id,AccountId) or a | rename AccountId AS ID ?

Also, Splunk is designed to handle extremely large lookups fairly efficiently. If your list of (Id, Accountname) tuples in sourcetype=Account is < 1GB a lookup is still applicable.

I get a somewhat funny feeling that your Account sourcetype is coming from a relational source of sorts anyway - so why not make a lookup file directly from there?

rsennett_splunk · ‎01-09-2014

I appreciate the blog link. A lot of good info there. the accumulated account list isn't practical in this case, as the list is massive and the search is taking only a small slice (and that slice changes based on daily assignments so the pull needs to be dynamic)
The second suggestion doesn't account for the fact that Id = AccountId which is why I had to join... I believe the 2nd suggestion is actually where I started... and the inner join is where I ended up. - hoping for more insight.

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

how to turn an inner join into sub-search and pass non matching values

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

how to turn an inner join into sub-search and pass non matching values

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...