topic Re: How to correlate data from three CSV file sources with joining fields and run a stats count on the final result? in Getting Data In

How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

sandyelrick — Fri, 24 Apr 2015 15:27:39 GMT

Hello Everyone,

Here is the scenario. I have three source CSV files with joining fields:

file1 field1 = file2 field2
file2 field3 = file3 field4

I need to get a count of entries by association... a sale rep is listed in file1 and there are one or more customers in file3. File2 is the key to bring everything together and I cannot figure out how to match and count how many customers there are per sales rep.

Any thoughts?

Sandy

Re: How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

lguinn2 — Fri, 24 Apr 2015 15:52:48 GMT

Try this

source=file2
| stats count by field2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| stats sum(count) as NumCustomers by SalesRep

assuming that file1 contains a field named "SalesRep" that identifies the sales rep. If there is one entry in file2 for each customer and sales rep combination, you don't actually need file3 if all you want is a count.

Re: How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

sandyelrick — Fri, 24 Apr 2015 16:15:33 GMT

Hello Lisa,

I can see where you are going but this only references two of the three sources, so I don't see how this can complete the count. Perhaps the correlation is too complex?

Re: How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

lguinn2 — Fri, 24 Apr 2015 19:25:35 GMT

It is definitely not too complex for Splunk; we can search as many sources as we want. But let me explain with data:

file1
SalesRep,field1
Jo Smith,1
Kerry Lee,2
S Rick,3

file2
field2,field3
1,100
1,101
2,102
3,103
3,104
3,105

file3
Customer,field4
ABC Co,100
XYZ Co,101
abc industries,102
xyz industries,103
myCo Inc,104
yourCo Inc,105

I've written these out as CSV files, but this will work exactly the same no matter what the file format as long as the fields exist. In file2, there is clearly a mapping between the sales rep and the customer. And, if all we want is the count of customers, we can count them up in file2 - there is no need to actually get the customer names from file3.

If I simply count the number of records for each sales rep (field2) in file2, I have the counts. All I need to do is to join file1 to the counts so that I can identify the sales rep.

But if you must join in file3, you can do this

source=file2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| join type=outer field3 [ search source=file3 | eval field3=field4 ]
| stats count by SalesRep

You will get the same answer, but the search will be a lot less efficient.

Re: How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

sandyelrick — Fri, 24 Apr 2015 20:35:07 GMT

Hello Lisa,

I think I finally got this to work. I think my logic has been incorrect as the result just reflects a count of all customers of file1 for each salesrep as opposed to only customers in file3 for each salesrep.

Hmmm!

Re: How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

lguinn2 — Sun, 26 Apr 2015 20:27:09 GMT

Ah, so there could be entries in file2 that don't actually correspond to anything in file3!

Okay, then change to this:

 source=file2
 | join type=outer field2 [ search source=file1 | eval field2=field1 ]
 | join type=outer field3 [ search source=file3 | eval field3=field4 ]
 | stats dc(CustNo) by SalesRep

Assuming that file1 contains a field called SalesRep and file3 contains a field called CustNo

Re: How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

sandyelrick — Mon, 27 Apr 2015 14:39:12 GMT

Ok. Slight change in the results but still sorting the same way as before. I think the explanation might be better served in another way. The file numbers may not correspond to previous work.

File 1 - Accounts.csv - Key fields: "Account Name" "Sales Representative"
File 2 - Contacts.csv - Key fields: "Email" & "Company Name"
File 3 - Attendees.csv - Key fields: "E-mail", "Full Name"

Links: "Account Name" = "Company Name" and "Email" = "Email"

Objectives:

1) Assign "Sales Representative" from Accounts.csv to each attendee "Full Name" in Attendees.csv
2) Count how many attendees are participating in event by sales rep. e.g. Joe Smith (Rep1) has 15 attendees, Jane Laird (Rep2) has 20 attendees), etc.

It is a different way of processing it. Does this help?

Sandy

Re: How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

lguinn2 — Mon, 27 Apr 2015 19:51:59 GMT

  source=Contacts.csv
  | join "Company Name" [ search source=Accounts.csv 
                            | rename "Account Name" as "Company Name" ]
  | join  Email [ search source=Attendees.csv  ]
  | stats dc("Full Name") as "Attendee Count" by "Sales Representative"