Solved: How to correlate data from three CSV file sources ...

sandyelrick · ‎04-24-2015

Hello Everyone,

Here is the scenario. I have three source CSV files with joining fields:

file1 field1 = file2 field2
file2 field3 = file3 field4

I need to get a count of entries by association... a sale rep is listed in file1 and there are one or more customers in file3. File2 is the key to bring everything together and I cannot figure out how to match and count how many customers there are per sales rep.

Any thoughts?

Sandy

lguinn2 · ‎04-24-2015

Try this

source=file2
| stats count by field2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| stats sum(count) as NumCustomers by SalesRep

assuming that file1 contains a field named "SalesRep" that identifies the sales rep. If there is one entry in file2 for each customer and sales rep combination, you don't actually need file3 if all you want is a count.

View solution in original post

lguinn2 · ‎04-24-2015

Try this

source=file2
| stats count by field2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| stats sum(count) as NumCustomers by SalesRep

assuming that file1 contains a field named "SalesRep" that identifies the sales rep. If there is one entry in file2 for each customer and sales rep combination, you don't actually need file3 if all you want is a count.

sandyelrick · ‎04-24-2015

Hello Lisa,

I can see where you are going but this only references two of the three sources, so I don't see how this can complete the count. Perhaps the correlation is too complex?

lguinn2 · ‎04-24-2015

It is definitely not too complex for Splunk; we can search as many sources as we want. But let me explain with data:

file1
SalesRep,field1
Jo Smith,1
Kerry Lee,2
S Rick,3

file2
field2,field3
1,100
1,101
2,102
3,103
3,104
3,105

file3
Customer,field4
ABC Co,100
XYZ Co,101
abc industries,102
xyz industries,103
myCo Inc,104
yourCo Inc,105

I've written these out as CSV files, but this will work exactly the same no matter what the file format as long as the fields exist. In file2, there is clearly a mapping between the sales rep and the customer. And, if all we want is the count of customers, we can count them up in file2 - there is no need to actually get the customer names from file3.

If I simply count the number of records for each sales rep (field2) in file2, I have the counts. All I need to do is to join file1 to the counts so that I can identify the sales rep.

But if you must join in file3, you can do this

source=file2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| join type=outer field3 [ search source=file3 | eval field3=field4 ]
| stats count by SalesRep

You will get the same answer, but the search will be a lot less efficient.

sandyelrick · ‎04-24-2015

Hello Lisa,

I think I finally got this to work. I think my logic has been incorrect as the result just reflects a count of all customers of file1 for each salesrep as opposed to only customers in file3 for each salesrep.

Hmmm!

lguinn2 · ‎04-26-2015

Ah, so there could be entries in file2 that don't actually correspond to anything in file3!

Okay, then change to this:

 source=file2
 | join type=outer field2 [ search source=file1 | eval field2=field1 ]
 | join type=outer field3 [ search source=file3 | eval field3=field4 ]
 | stats dc(CustNo) by SalesRep

Assuming that file1 contains a field called SalesRep and file3 contains a field called CustNo

sandyelrick · ‎04-27-2015

Ok. Slight change in the results but still sorting the same way as before. I think the explanation might be better served in another way. The file numbers may not correspond to previous work.

File 1 - Accounts.csv - Key fields: "Account Name" "Sales Representative"
File 2 - Contacts.csv - Key fields: "Email" & "Company Name"
File 3 - Attendees.csv - Key fields: "E-mail", "Full Name"

Links: "Account Name" = "Company Name" and "Email" = "Email"

Objectives:

1) Assign "Sales Representative" from Accounts.csv to each attendee "Full Name" in Attendees.csv
2) Count how many attendees are participating in event by sales rep. e.g. Joe Smith (Rep1) has 15 attendees, Jane Laird (Rep2) has 20 attendees), etc.

It is a different way of processing it. Does this help?

Sandy

lguinn2 · ‎04-27-2015

  source=Contacts.csv
  | join "Company Name" [ search source=Accounts.csv 
                            | rename "Account Name" as "Company Name" ]
  | join  Email [ search source=Attendees.csv  ]
  | stats dc("Full Name") as "Attendee Count" by "Sales Representative"

How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Quick connection discovery mode for forwarders

Build and Launch AI Agents from Your Splunk Workflows

Splunk Cloud Application Management in Terraform

Join the Conversation