Splunk Search

How to combine data from two CSV files using a common field where this field has a different name in each file?

dbcase
Motivator

Hi,

I have 2 CSV files that have a unique key of sorts.... The of sorts is, in one CSV file the unique key is named PREMISE, but in the other file it is named PREMISE_ID. What would be the best way to "join" the two files together using Splunk?

0 Karma

sundareshr
Legend

Not sure what you mean by join. You could try something like this.

source=file1.csv OR source=file2.csv | eval PREMISE=coalsce(PREMISE, PREMISE_ID) | stats count by PREMISE

This will give you a count of event grouped by PREMISE across both files.

Now, if you want to do a JOIN like a DB JOIN, then you could do something like this

source=file1.csv | JOIN PREMISE [ search source=file2.csv | rename PREMISE_ID as PREMISE | table PREMISE_ID ] 
0 Karma

dbcase
Motivator

Hmmmm, I was certain that it would work but oddly the query isn't returning any results

Here is what I have

index=top10 source="/home/oracle/workdir/account_log.csv" |join PREMISE [search source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]|table TICKET_CODE,CHANNEL

Where did I go wrong?

0 Karma

sundareshr
Legend

Do you get results when you run the queries separately? Try:

index=top10 source="/home/oracle/workdir/account_log.csv"  | table TICKET_CODE,CHANNEL

Do you get results you would expect? Then try

source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]

Do you get results you would expect?

Are the field names correct (case sensitive)? Try adding a PREMISE=* before the | join

0 Karma

dbcase
Motivator

Ok, scratching head a bit,

If I do this it works

index=top10 source="/home/oracle/workdir/account_log.csv" | table TICKET_CODE,CHANNEL

and if I do this it works

index=top10 source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME

aside for the starting index=top10 the commands seem to work independently (nice trick by the way)

but combining them with the join and I get 0 results even though I've hacked at it trying several different combinations. So it is something I'm doing wrong with the join.

No joy with this one either

index=top10 source="/home/oracle/workdir/account_log.csv" PREMISE="*" |join PREMISE [search source="/home/oracle/workdir/overalllocationdata.csv" | rename PREMISE_ID as PREMISE|table PREMISE LAST_NAME]|table TICKET_CODE,CHANNEL

I double checked and all the field names are in UPPER case.

I'm at the point where my head hurts and I'm wondering why there is a bloody spot on the wall......

0 Karma

dbcase
Motivator

also just FYI here is the complete tables and fields schema

Account_Log.csv
CHANNEL
CID
CPE_CONVERTED
CREATION_DATE
DIFF
OBJECT_CPE_ID
OBJECT_TYPE
OPERATION_TYPE
PREMISE
PREMISE_RANK
TICKET_CODE

OverallLocationData.csv
PREMISE_ID
ADDRESS1
ADDRESS2
CITY
COUNTRY
FIRST_NAME
LAST_NAME
LATITUDE
LAT_LNG_TYPE
LONGITUDE
POSTAL_CODE
PROVINCE

0 Karma

dbcase
Motivator

Hi Sundareshr,

I think you are on to something. Let me clarify a bit

file1.csv has fields:

PREMISE
TICKET_NUMBER
REASON_CODE
blah

file2.csv has fields
PREMISE_ID
LAST_NAME
FIRST_NAME
blah

I'm trying to join/link/reference/whatever you want to call it, the two files based off of PREMISE/PREMISE_ID. The two fields have slightly different names but the data is the same. Meaning PREMISE=1 in file1.csv equals PREMISE_ID=1 in file2.csv

0 Karma

sundareshr
Legend

Do any of the fields repeat in both files? Try this

source=1.csv OR source=2.csv | eval premise=coalesce(premise, premise_id) | stats list(lastname) as lname list(firstname) as fname list(ticketnumber) tnbr etc by premise

If this doesn't give you what you're looking for, you could do the join

source=file1.csv | JOIN PREMISE [ search source=file2.csv | rename PREMISE_ID as PREMISE | table PREMISE_ID LAST_NAME blah ] | table TICKET_NUMBER REASON_CODE blah
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...