Splunk Search

How to combine data from two CSV files using a common field where this field has a different name in each file?

dbcase
Motivator

Hi,

I have 2 CSV files that have a unique key of sorts.... The of sorts is, in one CSV file the unique key is named PREMISE, but in the other file it is named PREMISE_ID. What would be the best way to "join" the two files together using Splunk?

0 Karma

sundareshr
Legend

Not sure what you mean by join. You could try something like this.

source=file1.csv OR source=file2.csv | eval PREMISE=coalsce(PREMISE, PREMISE_ID) | stats count by PREMISE

This will give you a count of event grouped by PREMISE across both files.

Now, if you want to do a JOIN like a DB JOIN, then you could do something like this

source=file1.csv | JOIN PREMISE [ search source=file2.csv | rename PREMISE_ID as PREMISE | table PREMISE_ID ] 
0 Karma

dbcase
Motivator

Hmmmm, I was certain that it would work but oddly the query isn't returning any results

Here is what I have

index=top10 source="/home/oracle/workdir/account_log.csv" |join PREMISE [search source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]|table TICKET_CODE,CHANNEL

Where did I go wrong?

0 Karma

sundareshr
Legend

Do you get results when you run the queries separately? Try:

index=top10 source="/home/oracle/workdir/account_log.csv"  | table TICKET_CODE,CHANNEL

Do you get results you would expect? Then try

source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]

Do you get results you would expect?

Are the field names correct (case sensitive)? Try adding a PREMISE=* before the | join

0 Karma

dbcase
Motivator

Ok, scratching head a bit,

If I do this it works

index=top10 source="/home/oracle/workdir/account_log.csv" | table TICKET_CODE,CHANNEL

and if I do this it works

index=top10 source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME

aside for the starting index=top10 the commands seem to work independently (nice trick by the way)

but combining them with the join and I get 0 results even though I've hacked at it trying several different combinations. So it is something I'm doing wrong with the join.

No joy with this one either

index=top10 source="/home/oracle/workdir/account_log.csv" PREMISE="*" |join PREMISE [search source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]|table TICKET_CODE,CHANNEL

I double checked and all the field names are in UPPER case.

I'm at the point where my head hurts and I'm wondering why there is a bloody spot on the wall......

0 Karma

dbcase
Motivator

also just FYI here is the complete tables and fields schema

Account_Log.csv
CHANNEL
CID
CPE_CONVERTED
CREATION_DATE
DIFF
OBJECT_CPE_ID
OBJECT_TYPE
OPERATION_TYPE
PREMISE
PREMISE_RANK
TICKET_CODE

OverallLocationData.csv
PREMISE_ID
ADDRESS1
ADDRESS2
CITY
COUNTRY
FIRST_NAME
LAST_NAME
LATITUDE
LAT_LNG_TYPE
LONGITUDE
POSTAL_CODE
PROVINCE

0 Karma

dbcase
Motivator

Hi Sundareshr,

I think you are on to something. Let me clarify a bit

file1.csv has fields:

PREMISE
TICKET_NUMBER
REASON_CODE
blah

file2.csv has fields
PREMISE_ID
LAST_NAME
FIRST_NAME
blah

I'm trying to join/link/reference/whatever you want to call it, the two files based off of PREMISE/PREMISE_ID. The two fields have slightly different names but the data is the same. Meaning PREMISE=1 in file1.csv equals PREMISE_ID=1 in file2.csv

0 Karma

sundareshr
Legend

Do any of the fields repeat in both files? Try this

source=1.csv OR source=2.csv | eval premise=coalesce(premise, premise_id) | stats list(lastname) as lname list(firstname) as fname list(ticketnumber) tnbr etc by premise

If this doesn't give you what you're looking for, you could do the join

source=file1.csv | JOIN PREMISE [ search source=file2.csv | rename PREMISE_ID as PREMISE | table PREMISE_ID LAST_NAME blah ] | table TICKET_NUMBER REASON_CODE blah
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...