Ok, y'all, I'm completely flummoxed.
Simplified: I have two sourcetypes ("a" and "b"). Each sourcetype has 500,000 items. I want to join "b" onto "a", matching on "id", because there's a column in "b" that I want. I know for a fact that all 500,000 entries in "a" have a corresponding match in "b". BTW, the timestamps in "a" and "b" don't line up, and I don't care about that - this isn't a "transaction". They may be off by a day or more, and that's fine.
I try something like this:
sourcetype="a" | join id type="left" [ search sourcetype="b" type="baz" | fields id unit ] | table id, foo, bar, unit
I expect to have 500,000 results, with columns "id", "foo", "bar", (from "a") and "unit" (from "b") all filled in.
Instead, I get (for simplification) roughly 3,500 records with "unit" from sourcetype "b", and the rest with "unit" blank.
+--------+------+------+--------+
| id | foo | bar | unit |
+--------+------+------+--------+
| 12345 | 1 | 1 | 99999 |
| 67890 | 1 | 1 | | <-- ?
| 54321 | 1 | 1 | | <-- ?
+--------+------+------+--------+
I cannot for the life of me figure out why this is!
I compared values in "a" that had matches display vs. those that did not, and could not spot a difference.
Note, if I specify id in both sides of the join, like so:
sourcetype="a" id=12345 | join id type="left" [ search sourcetype="b" type="baz" id=12345 | fields id unit ] | table id, foo, bar, unit
Then I get the results I expect. I need 500,000 rows, though, not one.
Any ideas?
One final clue: when I compare those with a "hit" versus those without a "hit", I can restrict the search on "a" and pipe the results into | table *, like so:
sourcetype="a" id=12345 | join id type="left" [ search sourcetype="b" type="baz" | fields id unit ] | table *
sourcetype="a" id=67890 | join id type="left" [ search sourcetype="b" type="baz" | fields id unit ] | table *
Assuming 12345 shows "unit" from "b", and 67890 does not, when I pipe to | table * it's interesting, because the "resultset" column shows "b" for 12345, but "a" for 67890. WTF?!
+--------+-------------+--------+
| id | sourcetype | unit |
+--------+-------------+--------+
| 12345 | b | 99999 |
| 67890 | a | | <-- ?
+--------+------+------+--------+
I am not a smart man. And Splunk isn't SQL.
Help?...
By default a subsearch will not yield 500000 events, IIRC it'll only give you 50000. Take a look at http://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-joi... for alternatives.
The join
command has a limit of 50000 records. Once you run your search, have a look on the "Job" link, below the search box on the right side. It might show a exclamation saying that you reached the limit.
Awesome, thanks so much! That pointed me in the right direction. I'm still a newb, apparently.
By default a subsearch will not yield 500000 events, IIRC it'll only give you 50000. Take a look at http://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-joi... for alternatives.
Thanks so much! The link you sent was useful, and I learned a lot.
I'm good with SQL, not so much with SPL (as if you couldn't tell by my post).
Ultimately, I had to sit down and think about my data sources, and discern which data really are "events," and which data are "static" (or semi-static) sets.
Instead of fooling with all this left join business, I refactored my data on the right side of my query into lookups. Works great. I'm refining now, making the lookups smarter, but as it is, a simple CSV is doing the job. And it's FAST.
Thanks for the help. I'm learning to think like Splunk. It's taking a little time.