Splunk Search

Left Join not Returning all Results on Right Side

photuris
Explorer

Ok, y'all, I'm completely flummoxed.

Simplified: I have two sourcetypes ("a" and "b"). Each sourcetype has 500,000 items. I want to join "b" onto "a", matching on "id", because there's a column in "b" that I want. I know for a fact that all 500,000 entries in "a" have a corresponding match in "b". BTW, the timestamps in "a" and "b" don't line up, and I don't care about that - this isn't a "transaction". They may be off by a day or more, and that's fine.

I try something like this:

sourcetype="a" | join id type="left" [ search sourcetype="b" type="baz" | fields id unit ] | table id, foo, bar, unit

I expect to have 500,000 results, with columns "id", "foo", "bar", (from "a") and "unit" (from "b") all filled in.

Instead, I get (for simplification) roughly 3,500 records with "unit" from sourcetype "b", and the rest with "unit" blank.

+--------+------+------+--------+
| id     | foo  | bar  | unit   |
+--------+------+------+--------+
| 12345  | 1    | 1    | 99999  |
| 67890  | 1    | 1    |        | <-- ?
| 54321  | 1    | 1    |        | <-- ?
+--------+------+------+--------+

I cannot for the life of me figure out why this is!

I compared values in "a" that had matches display vs. those that did not, and could not spot a difference.

Note, if I specify id in both sides of the join, like so:

sourcetype="a" id=12345 | join id type="left" [ search sourcetype="b" type="baz" id=12345 | fields id unit ] | table id, foo, bar, unit

Then I get the results I expect. I need 500,000 rows, though, not one.

Any ideas?

One final clue: when I compare those with a "hit" versus those without a "hit", I can restrict the search on "a" and pipe the results into | table *, like so:

sourcetype="a" id=12345 | join id type="left" [ search sourcetype="b" type="baz" | fields id unit ] | table *

sourcetype="a" id=67890 | join id type="left" [ search sourcetype="b" type="baz" | fields id unit ] | table *

Assuming 12345 shows "unit" from "b", and 67890 does not, when I pipe to | table * it's interesting, because the "resultset" column shows "b" for 12345, but "a" for 67890. WTF?!

+--------+-------------+--------+
| id     | sourcetype  | unit   |
+--------+-------------+--------+
| 12345  | b           | 99999  |
| 67890  | a           |        | <-- ?
+--------+------+------+--------+

I am not a smart man. And Splunk isn't SQL.

Help?...

1 Solution

martin_mueller
SplunkTrust
SplunkTrust

By default a subsearch will not yield 500000 events, IIRC it'll only give you 50000. Take a look at http://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-joi... for alternatives.

View solution in original post

musskopf
Builder

The join command has a limit of 50000 records. Once you run your search, have a look on the "Job" link, below the search box on the right side. It might show a exclamation saying that you reached the limit.

photuris
Explorer

Awesome, thanks so much! That pointed me in the right direction. I'm still a newb, apparently.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

By default a subsearch will not yield 500000 events, IIRC it'll only give you 50000. Take a look at http://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-joi... for alternatives.

photuris
Explorer

Thanks so much! The link you sent was useful, and I learned a lot.

I'm good with SQL, not so much with SPL (as if you couldn't tell by my post).

Ultimately, I had to sit down and think about my data sources, and discern which data really are "events," and which data are "static" (or semi-static) sets.

Instead of fooling with all this left join business, I refactored my data on the right side of my query into lookups. Works great. I'm refining now, making the lookups smarter, but as it is, a simple CSV is doing the job. And it's FAST.

Thanks for the help. I'm learning to think like Splunk. It's taking a little time.

Get Updates on the Splunk Community!

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...

September Community Champions: A Shoutout to Our Contributors!

As we close the books on another fantastic month, we want to take a moment to celebrate the people who are the ...

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...