Solved: Pairing data from two CSV files with a shared key ...

daniel_rico · ‎08-15-2017

I have two CSV files-- one is an inventory of sorts and the other is supplemental data that only applies to certain rows in the first table. Let's say for the example's sake that it's Inventory.csv and Userdata.csv

I want to have the output populate a new table and when the Userdata.csv has data that matches the ID in the Inventory.csv

The main issue right now is that I have to split them into two separate dashboard panels. I can't seem to have them join conditionally, I end up with a multivalued mish-mosh and everything points to the data needing to be indexed. It changes on a weekly basis and both CSV files may have changes.

| inputlookup Inventory.csv 
| eval ID=upper(ID) 
| lookup Userdata.csv ID as ID
| table ID Field1 Field2 Field3 Field4

Both CSV files contain a key identifier, an ID field. Does the data need to be indexed for me to search it/put it into a dashboard where I can look up fields and have it populate a table? The tabled fields are from both CSV files. Right now, as stated above I'm getting a ton of mv fields; I'd much rather have this in a way that's easier for anyone to import and search on.

My research showed that the best way to get this done is to get the data indexed since it makes it like any other data to search. These inputlookup[s] are killing me, I've had to jump through hoops to build dashboards and would rather not have something immensely resource intensive.

DalJeanis · ‎08-15-2017

Let's see if we can simplify your life, shall we? As they say in Hitchhiker's Guide to the Galaxy, "DON'T PANIC!"

I'm going to make TWO assumptions -
(A) there is only one record per ID in each of your two files
(B) the files have only ONE field in common - ID - and all other field names are distinct.

Ready for this?

 | inputlookup append=t Inventory.csv 
 | inputlookup append=t Userdata.csv 
 | eval ID=upper(ID)
 | stats values(*) as * by ID

Now you have one record per ID, and all available fields are on it. You already did that, this just accomplishes roughly the same as what you were doing. However, I wanted you watching my left hand while I kept you from panicking. I'm tricky that way. Here's what the right hand was setting up...

Use the above query as a scheduled base search in the dash, and then use post processing in the panel to subset those records and filter the fields for each panel as appropriate.

Add this filter to get the ones that ARE present in the userdata csv...

 | where isnotnull(myuserdatafield)

Add this filter to get the ones that are NOT present in the userdata csv...

 | where isnull(myuserdatafield)

You can add a | table or | fields command to limit the fields for any given panel.

Now, we have to get back to perhaps panicking. What if there might be multiple records in one or the other file for a single ID. For instance, what if there's a record for the ID with lower case, and one for upper case?

Well, that could result in a multivalued mish-mosh. Sound familiar?

So, let's check that, shall we?

 | inputlookup append=t Inventory.csv 
 | eval ID=upper(ID)
 | stats list(*) as * by ID
 | where mvcount(someinventoryfield)>1


 | inputlookup append=t Userdata.csv 
 | eval ID=upper(ID)
 | stats list(*) as * by ID
 | where mvcount(someuserdatafield)>1

If either of the above returns any records, then we need to actually analyze what is in the relevant file, so we can do an appropriate massaging of the data. Maybe the inventory contains records over time, some of which are marked as expired, so we need to filter them out. Maybe the user data contains duplicate records, but they are effectively identical. We just have to figure it out and code appropriately.

Either way, the thing will be to create a single, valid search that combines the data appropriately, and use that as the basis for any further work or presentation that you do.

View solution in original post

DalJeanis · ‎08-15-2017

By the way, the verb to look at, if you wanted to take your join code and actually index it to be searchable in an index, is collect.

daniel_rico · ‎08-16-2017

@DalJeanis - Could you elaborate on this?

daniel_rico · ‎08-15-2017

I actually ended up testing this:

 | inputlookup Inventory.csv  
 | eval ID=upper(ID) 
 | join type=outer ID 
    [| inputlookup Userdata.csv 
     | rename "foo_string" as "new_foo_string"]  
 | table ID field1 field2 field3 field4 field5 field6 field7 
 | search field6="*" AND field1="*" 
 | search field1=$token$ AND field7=$token$ 
 | sort - field7

And it seems to work 🙂

Occam's Razor right?

daniel_rico · ‎08-15-2017

The wildcard isn't showing in the ' | search ... ' clause but it's essentially looking at both CSVs, determining that values exist in both, then tabling it. It's tied to two dropdown[s] that provide the value.

DalJeanis · ‎08-15-2017

@[email protected] - I marked your code for you. If you put at least four spaces before the line, the system will treat it as code. You can also use the 101 010 button, or put grave accents before and after small snippets. A grave accent looks like this-> (`) <- It's the one under the tilde (~) on an American keyboard.

I'm also going to move your comment to an answer, and you can accept it, so the question will show as closed.

daniel_rico · ‎08-16-2017

@DalJeanis - Thank you! I ended up testing out both methods and both appear to work. I'd like more people to review your code as well so I'm going to mark it as the "Answer". Can you upvote my own as I believe both apply dependent on the situation. I think this might actually the first post chain with correct syntax for application in multiple scenarios.

DalJeanis · ‎08-16-2017

Done. Take care.

DalJeanis · ‎08-15-2017