I am using the Fundamentals 1 dataset to learn about lookups. I have created a csv file with a column for productId (to match the productId field in the access_combined_wcookie sourcetype), a column for produceCode (to match the Code field in the vendor_sales sourcetype), and a column for productPrice (that's the output field). I've created a lookup definition called productDetails which seems correct when I check it using inputlookup.
What I want to do is create a timechart with 2 series: one with online sales, and the other vendor sales. This is what I've tried:
index=main (sourcetype="vendor_sales" OR sourcetype="access_combined_wcookie")
| lookup productDetails productId AS productId
| lookup productDetails productCode AS Code
| timechart count(productPrice) by sourcetype
The first lookup matches the productId field in access_combined_wcookie, and the second lookup matches the Code field in vendor_sales. The result I'm getting is a set of counts for access_combined_wcookie, and all 0's for vendor_sales. If I switch around the order of the lookups, then I get counts for vendor_sales but 0's for access_combined_wcookie. Basically, whichever lookup is second is ignored.
I found a lot of forum posts about joining multiple csv lookups, but I couldn't apply them to my problem. Any help would be greatly appreciated. Thanks.
I found a solution, just not sure I understand why. I had to specify the output field:
index=main (sourcetype="vendor_sales" OR sourcetype="access_combined_wcookie")
| lookup productDetails productId AS productId OUTPUT productPrice
| lookup productDetails productCode AS Code OUTPUT productPrice
| timechart count(productPrice) by sourcetype
The problem has something to do with the overlap in fields. When the OUTPUT is not specified, every field besides the one used for INPUT is added to events. So that means for events with a productId, both the productPrice and productCode fields were added. And for events with a Code field, both productPrice and productId were added. But the new productCode and productId fields somehow interfere with each other, breaking the entire search.
Because if I try to make all events have all 3 fields (id, code and price), I do this:
index=main (sourcetype="vendor_sales" OR sourcetype="access_combined_wcookie")
| lookup productDetails productCode AS Code OUTPUT productPrice productId
| lookup productDetails productId AS productId OUTPUT productPrice productCode
| timechart count(productPrice) by sourcetype
and then all counts go to 0.
Update:
The solution listed above only works with a Verbose search. If I switch to a Fast or Smart Search, the counts go back to 0. I think maybe the solution to this is "don't use 2 lookups..."
Update 2
This is a known issue detailed here. The real solution was to change my limits.conf file - that fixed everything.
It's the designed behaviour.
Quoting docs: "
Usage
The lookup command is a distributable streaming command when local=false, which is the default setting. See Command types.
When using the lookup command, if an OUTPUT or OUTPUTNEW clause is not specified, all of the fields in the lookup table that are not the match fields are used as output fields. If the OUTPUT clause is specified, the output lookup fields overwrite existing fields. If the OUTPUTNEW clause is specified, the lookup is not performed for events in which the output fields already exist"
https://docs.splunk.com/Documentation/Splunk/8.2.2/SearchReference/Lookup#Usage
Right, but it should work. In my dataset, the events will either have productId or Code, but not both. If I don't specify OUTPUT, then after the first lookup anything with a productId should given a Code (and a price). When the second lookup is run, it should affect every event (every event will have a Code at that point) giving everything a productPrice (overwriting the ones that already had a price from the first lookup). But in reality after the 2 lookups, nothing has a productPrice (all counts are 0)
@m2oswald try using this:
| lookup productDetails productId AS productId productCode AS Code
Also, If this reply helps you, an upvote would be appreciated.
No, unfortunately not. When I try this, then all of my counts go to 0. I'm pretty sure the problem here is that it's trying to match 2 input fields for each event. In my case, the events will either have 1 field or the other, but not both
@m2oswald yeah in this case the issue will persist. Anyone else have any clue would be appreciated.