Thanks, @bowesmana . Q - "When you say fuzzy, do you mean it should match based on similarity using something like Levenshtein distance? Do you want 123 main street 123 maine street 123 cain st...
See more...
Thanks, @bowesmana . Q - "When you say fuzzy, do you mean it should match based on similarity using something like Levenshtein distance? Do you want 123 main street 123 maine street 123 cain street all to match." A - No. I know about Levenshtein ; however, the similarity would have to disregard (not the correct word) the street numbers in counting/calculating. 123 main street and 124 main street would never be a match. 123 main street and 123 main street apt 2 would be a match. It is assumed, and probably incorrectly, the property owner of 123 main street apt 4 and 123 main street apt 6 are the same for the building. Of course condos knock this idea out. Q - "What size is your lookup - you may well be hitting the default limits defined (25MB)" A - csv: 1 million records - 448,500 bytes // kvstore: 3 million records - 2,743.66 MB Q - "What are you currently doing to be 'fuzzy' so your matches currently work or are you really looking for exact matches somewhere in your data?" A - I stripped off any non-numeric characters at the beginning of the address on the lookup and use that field for the as in my lookup command with my kvstore | lookup my_kvstore addr as mod_addr output owner Q - Is your KV store currently being updated - and is it replicated? A - No replication. The data would be refreshed yearly, or possibly every quarter. Q - Also, if you are just looking at some exact match somewhere, then the KV store may benefit from using accelerated fields - that can speed up lookups against the KV store (if that's the way you're doing it) significantly. A - Using the above code, the addr would be the accelerated field, correct? Thanks again for your help and God bless. Genesius