Getting Data In

Comparing two huge csv files

salpaysog
Explorer

I have two csv files of email adresses that I want to compare by listing email adresses only available in one (and respectively in the other one). What I want to do is similar to a "minus" operation in SQL.

This issue was already solved in many threads such as:
-https://answers.splunk.com/answers/56586/list-difference-between-two-csv-files.html
-https://answers.splunk.com/answers/386822/how-to-compare-search-and-csv-file.html

However, my csv files are huge (300000+). And most of the email adresses are common to both. I just need to extract the few oddities.

Subsearches and joins are limited (maxout limit of subsearch 10000 in my enterprise edition).

Does anyone have an idea how to use Splunk to solve this?

I have tried to use excel or even written a python script but it takes hell of a time and my computer does not support the calculations...

0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi salpaysog,
load files in an index (maybe with a scheduled search by night) and then run a something like the following

index=my_csv_index
| stats value(source) AS source DC(source) AS count BY email
| where count=1

In this way you have only emails that are in one csv file.

Bye.
Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi salpaysog,
load files in an index (maybe with a scheduled search by night) and then run a something like the following

index=my_csv_index
| stats value(source) AS source DC(source) AS count BY email
| where count=1

In this way you have only emails that are in one csv file.

Bye.
Giuseppe

salpaysog
Explorer

This is brilliant thank you Giuseppe!
Works well and very fast.

0 Karma
Get Updates on the Splunk Community!

Splunk Search APIを使えば調査過程が残せます

   このゲストブログは、JCOM株式会社の情報セキュリティ本部・専任部長である渡辺慎太郎氏によって執筆されました。 Note: This article is published in both Japanese ...

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

 Splunk is More Than Just the Web Console For Digital Forensics and Incident Response (DFIR) practitioners, ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...