<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Comparing two huge csv files in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Comparing-two-huge-csv-files/m-p/381482#M68802</link>
    <description>&lt;P&gt;I have two csv files of email adresses that I want to compare by listing email adresses only available in one (and respectively in the other one). What I want to do is similar to a "minus" operation in SQL.&lt;/P&gt;

&lt;P&gt;This issue was already solved in many threads such as:&lt;BR /&gt;
-&lt;A href="https://answers.splunk.com/answers/56586/list-difference-between-two-csv-files.html"&gt;https://answers.splunk.com/answers/56586/list-difference-between-two-csv-files.html&lt;/A&gt;&lt;BR /&gt;
-&lt;A href="https://answers.splunk.com/answers/386822/how-to-compare-search-and-csv-file.html"&gt;https://answers.splunk.com/answers/386822/how-to-compare-search-and-csv-file.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;However, my csv files are huge (300000+). And most of the email adresses are common to both. I just need to extract the few oddities.&lt;/P&gt;

&lt;P&gt;Subsearches and joins are limited (maxout limit of subsearch 10000 in my enterprise edition).&lt;/P&gt;

&lt;P&gt;Does anyone have an idea how to use Splunk to solve this?&lt;/P&gt;

&lt;P&gt;I have tried to use excel or even written a python script but it takes hell of a time and my computer does not support the calculations...&lt;/P&gt;</description>
    <pubDate>Thu, 27 Dec 2018 13:09:28 GMT</pubDate>
    <dc:creator>salpaysog</dc:creator>
    <dc:date>2018-12-27T13:09:28Z</dc:date>
    <item>
      <title>Comparing two huge csv files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Comparing-two-huge-csv-files/m-p/381482#M68802</link>
      <description>&lt;P&gt;I have two csv files of email adresses that I want to compare by listing email adresses only available in one (and respectively in the other one). What I want to do is similar to a "minus" operation in SQL.&lt;/P&gt;

&lt;P&gt;This issue was already solved in many threads such as:&lt;BR /&gt;
-&lt;A href="https://answers.splunk.com/answers/56586/list-difference-between-two-csv-files.html"&gt;https://answers.splunk.com/answers/56586/list-difference-between-two-csv-files.html&lt;/A&gt;&lt;BR /&gt;
-&lt;A href="https://answers.splunk.com/answers/386822/how-to-compare-search-and-csv-file.html"&gt;https://answers.splunk.com/answers/386822/how-to-compare-search-and-csv-file.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;However, my csv files are huge (300000+). And most of the email adresses are common to both. I just need to extract the few oddities.&lt;/P&gt;

&lt;P&gt;Subsearches and joins are limited (maxout limit of subsearch 10000 in my enterprise edition).&lt;/P&gt;

&lt;P&gt;Does anyone have an idea how to use Splunk to solve this?&lt;/P&gt;

&lt;P&gt;I have tried to use excel or even written a python script but it takes hell of a time and my computer does not support the calculations...&lt;/P&gt;</description>
      <pubDate>Thu, 27 Dec 2018 13:09:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Comparing-two-huge-csv-files/m-p/381482#M68802</guid>
      <dc:creator>salpaysog</dc:creator>
      <dc:date>2018-12-27T13:09:28Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing two huge csv files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Comparing-two-huge-csv-files/m-p/381483#M68803</link>
      <description>&lt;P&gt;Hi  salpaysog,&lt;BR /&gt;
load files in an index (maybe with a scheduled search by night) and then run a something like the following&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=my_csv_index
| stats value(source) AS source DC(source) AS count BY email
| where count=1
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In this way you have only emails that are in one csv file.&lt;/P&gt;

&lt;P&gt;Bye.&lt;BR /&gt;
Giuseppe&lt;/P&gt;</description>
      <pubDate>Thu, 27 Dec 2018 15:24:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Comparing-two-huge-csv-files/m-p/381483#M68803</guid>
      <dc:creator>gcusello</dc:creator>
      <dc:date>2018-12-27T15:24:01Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing two huge csv files</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Comparing-two-huge-csv-files/m-p/381484#M68804</link>
      <description>&lt;P&gt;This is brilliant thank you Giuseppe! &lt;BR /&gt;
Works well and very fast.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Dec 2018 08:27:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Comparing-two-huge-csv-files/m-p/381484#M68804</guid>
      <dc:creator>salpaysog</dc:creator>
      <dc:date>2018-12-28T08:27:50Z</dc:date>
    </item>
  </channel>
</rss>

