- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to combine data from two CSV files using a common field where this field has a different name in each file?

Hi,
I have 2 CSV files that have a unique key of sorts.... The of sorts is, in one CSV file the unique key is named PREMISE, but in the other file it is named PREMISE_ID. What would be the best way to "join" the two files together using Splunk?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Not sure what you mean by join. You could try something like this.
source=file1.csv OR source=file2.csv | eval PREMISE=coalsce(PREMISE, PREMISE_ID) | stats count by PREMISE
This will give you a count of event grouped by PREMISE across both files.
Now, if you want to do a JOIN like a DB JOIN, then you could do something like this
source=file1.csv | JOIN PREMISE [ search source=file2.csv | rename PREMISE_ID as PREMISE | table PREMISE_ID ]
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hmmmm, I was certain that it would work but oddly the query isn't returning any results
Here is what I have
index=top10 source="/home/oracle/workdir/account_log.csv" |join PREMISE [search source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]|table TICKET_CODE,CHANNEL
Where did I go wrong?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Do you get results when you run the queries separately? Try:
index=top10 source="/home/oracle/workdir/account_log.csv" | table TICKET_CODE,CHANNEL
Do you get results you would expect? Then try
source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]
Do you get results you would expect?
Are the field names correct (case sensitive)? Try adding a PREMISE=*
before the | join
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Ok, scratching head a bit,
If I do this it works
index=top10 source="/home/oracle/workdir/account_log.csv" | table TICKET_CODE,CHANNEL
and if I do this it works
index=top10 source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME
aside for the starting index=top10 the commands seem to work independently (nice trick by the way)
but combining them with the join and I get 0 results even though I've hacked at it trying several different combinations. So it is something I'm doing wrong with the join.
No joy with this one either
index=top10 source="/home/oracle/workdir/account_log.csv" PREMISE="*" |join PREMISE [search source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]|table TICKET_CODE,CHANNEL
I double checked and all the field names are in UPPER case.
I'm at the point where my head hurts and I'm wondering why there is a bloody spot on the wall......
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

also just FYI here is the complete tables and fields schema
Account_Log.csv
CHANNEL
CID
CPE_CONVERTED
CREATION_DATE
DIFF
OBJECT_CPE_ID
OBJECT_TYPE
OPERATION_TYPE
PREMISE
PREMISE_RANK
TICKET_CODE
OverallLocationData.csv
PREMISE_ID
ADDRESS1
ADDRESS2
CITY
COUNTRY
FIRST_NAME
LAST_NAME
LATITUDE
LAT_LNG_TYPE
LONGITUDE
POSTAL_CODE
PROVINCE
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Sundareshr,
I think you are on to something. Let me clarify a bit
file1.csv has fields:
PREMISE
TICKET_NUMBER
REASON_CODE
blah
file2.csv has fields
PREMISE_ID
LAST_NAME
FIRST_NAME
blah
I'm trying to join/link/reference/whatever you want to call it, the two files based off of PREMISE/PREMISE_ID. The two fields have slightly different names but the data is the same. Meaning PREMISE=1 in file1.csv equals PREMISE_ID=1 in file2.csv
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Do any of the fields repeat in both files? Try this
source=1.csv OR source=2.csv | eval premise=coalesce(premise, premise_id) | stats list(lastname) as lname list(firstname) as fname list(ticketnumber) tnbr etc by premise
If this doesn't give you what you're looking for, you could do the join
source=file1.csv | JOIN PREMISE [ search source=file2.csv | rename PREMISE_ID as PREMISE | table PREMISE_ID LAST_NAME blah ] | table TICKET_NUMBER REASON_CODE blah
