Getting Data In

How to correlate data from three CSV file sources with joining fields and run a stats count on the final result?

Explorer

Hello Everyone,

Here is the scenario. I have three source CSV files with joining fields:

file1 field1 = file2 field2
file2 field3 = file3 field4

I need to get a count of entries by association... a sale rep is listed in file1 and there are one or more customers in file3. File2 is the key to bring everything together and I cannot figure out how to match and count how many customers there are per sales rep.

Any thoughts?

Sandy

Tags (5)
1 Solution
Legend

Try this

source=file2
| stats count by field2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| stats sum(count) as NumCustomers by SalesRep

assuming that file1 contains a field named "SalesRep" that identifies the sales rep. If there is one entry in file2 for each customer and sales rep combination, you don't actually need file3 if all you want is a count.

Legend

Try this

source=file2
| stats count by field2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| stats sum(count) as NumCustomers by SalesRep

assuming that file1 contains a field named "SalesRep" that identifies the sales rep. If there is one entry in file2 for each customer and sales rep combination, you don't actually need file3 if all you want is a count.

Explorer

Hello Lisa,

I can see where you are going but this only references two of the three sources, so I don't see how this can complete the count. Perhaps the correlation is too complex?

Legend

It is definitely not too complex for Splunk; we can search as many sources as we want. But let me explain with data:

file1
SalesRep,field1
Jo Smith,1
Kerry Lee,2
S Rick,3

file2
field2,field3
1,100
1,101
2,102
3,103
3,104
3,105

file3
Customer,field4
ABC Co,100
XYZ Co,101
abc industries,102
xyz industries,103
myCo Inc,104
yourCo Inc,105

I've written these out as CSV files, but this will work exactly the same no matter what the file format as long as the fields exist. In file2, there is clearly a mapping between the sales rep and the customer. And, if all we want is the count of customers, we can count them up in file2 - there is no need to actually get the customer names from file3.

If I simply count the number of records for each sales rep (field2) in file2, I have the counts. All I need to do is to join file1 to the counts so that I can identify the sales rep.

But if you must join in file3, you can do this

source=file2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| join type=outer field3 [ search source=file3 | eval field3=field4 ]
| stats count by SalesRep

You will get the same answer, but the search will be a lot less efficient.

Explorer

Hello Lisa,

I think I finally got this to work. I think my logic has been incorrect as the result just reflects a count of all customers of file1 for each salesrep as opposed to only customers in file3 for each salesrep.

Hmmm!

Legend

Ah, so there could be entries in file2 that don't actually correspond to anything in file3!

Okay, then change to this:

source=file2
| join type=outer field2 [ search source=file1 | eval field2=field1 ]
| join type=outer field3 [ search source=file3 | eval field3=field4 ]
| stats dc(CustNo) by SalesRep

Assuming that file1 contains a field called SalesRep and file3 contains a field called CustNo

Explorer

Ok. Slight change in the results but still sorting the same way as before. I think the explanation might be better served in another way. The file numbers may not correspond to previous work.

File 1 - Accounts.csv - Key fields: "Account Name" "Sales Representative"
File 2 - Contacts.csv - Key fields: "Email" & "Company Name"
File 3 - Attendees.csv - Key fields: "E-mail", "Full Name"

Links: "Account Name" = "Company Name" and "Email" = "Email"

Objectives:

1) Assign "Sales Representative" from Accounts.csv to each attendee "Full Name" in Attendees.csv
2) Count how many attendees are participating in event by sales rep. e.g. Joe Smith (Rep1) has 15 attendees, Jane Laird (Rep2) has 20 attendees), etc.

It is a different way of processing it. Does this help?

Sandy

Legend
source=Contacts.csv
| join "Company Name" [ search source=Accounts.csv
| rename "Account Name" as "Company Name" ]
| join  Email [ search source=Attendees.csv  ]
| stats dc("Full Name") as "Attendee Count" by "Sales Representative"
Get Updates on the Splunk Community!

2024 Splunk Career Impact Survey | Earn a \$20 gift card for participating!

Hear ye, hear ye! The time has come again for Splunk's annual Career Impact Survey!  We need your help by ...

Optimize Cloud Monitoring

TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...