<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to correlate multiple CSV files using different columns? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152762#M31097</link>
    <description>&lt;P&gt;Hi all.&lt;/P&gt;

&lt;P&gt;I have almost 6 CSV files extracted from a running system where i can't access the backend to install a forwarder, so, my best option is process the csv output files.&lt;/P&gt;

&lt;P&gt;The files, looks like this:&lt;/P&gt;

&lt;P&gt;File1.csv = NumberID and almost, 30 columns more.&lt;BR /&gt;
File2.csv = NumberID, RegID and almost 20 columns more.&lt;BR /&gt;
File3.csv = RegID and almost 40 columns more.&lt;BR /&gt;
File4.csv = RegID and almost 20 columns more.&lt;BR /&gt;
File5.csv = NumberID and almost 5 columns more.&lt;BR /&gt;
File6.csv = RegID and almost 8 columns more.&lt;/P&gt;

&lt;P&gt;I need to correlate all files to build a big file with relevant information of each file (i choose the value columns) based only in the NumberID and RegID but these fields are only present in certain files, so, i need to change the "pattern column" while I finish.&lt;/P&gt;

&lt;P&gt;Based on this, i have some questions:&lt;/P&gt;

&lt;P&gt;1.) If my csv changes almost 1 time per week, what is the better option to be "ingested" by splunk? I mean, i need to analyze only my last files and not all the history of the records.&lt;BR /&gt;
2.) How i can do the correlation? I checked other answers like:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://answers.splunk.com/answers/232031/how-to-correlate-data-from-three-csv-file-sources.html"&gt;&lt;/A&gt;&lt;A href="http://answers.splunk.com/answers/232031/how-to-correlate-data-from-three-csv-file-sources.html" target="test_blank"&gt;http://answers.splunk.com/answers/232031/how-to-correlate-data-from-three-csv-file-sources.html&lt;/A&gt;&lt;BR /&gt;
&lt;/P&gt;

&lt;P&gt;But i don't know which is the best option.&lt;/P&gt;

&lt;P&gt;Thank you so much for your help.&lt;/P&gt;</description>
    <pubDate>Wed, 10 Jun 2015 12:51:43 GMT</pubDate>
    <dc:creator>changux</dc:creator>
    <dc:date>2015-06-10T12:51:43Z</dc:date>
    <item>
      <title>How to correlate multiple CSV files using different columns?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152762#M31097</link>
      <description>&lt;P&gt;Hi all.&lt;/P&gt;

&lt;P&gt;I have almost 6 CSV files extracted from a running system where i can't access the backend to install a forwarder, so, my best option is process the csv output files.&lt;/P&gt;

&lt;P&gt;The files, looks like this:&lt;/P&gt;

&lt;P&gt;File1.csv = NumberID and almost, 30 columns more.&lt;BR /&gt;
File2.csv = NumberID, RegID and almost 20 columns more.&lt;BR /&gt;
File3.csv = RegID and almost 40 columns more.&lt;BR /&gt;
File4.csv = RegID and almost 20 columns more.&lt;BR /&gt;
File5.csv = NumberID and almost 5 columns more.&lt;BR /&gt;
File6.csv = RegID and almost 8 columns more.&lt;/P&gt;

&lt;P&gt;I need to correlate all files to build a big file with relevant information of each file (i choose the value columns) based only in the NumberID and RegID but these fields are only present in certain files, so, i need to change the "pattern column" while I finish.&lt;/P&gt;

&lt;P&gt;Based on this, i have some questions:&lt;/P&gt;

&lt;P&gt;1.) If my csv changes almost 1 time per week, what is the better option to be "ingested" by splunk? I mean, i need to analyze only my last files and not all the history of the records.&lt;BR /&gt;
2.) How i can do the correlation? I checked other answers like:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://answers.splunk.com/answers/232031/how-to-correlate-data-from-three-csv-file-sources.html"&gt;&lt;/A&gt;&lt;A href="http://answers.splunk.com/answers/232031/how-to-correlate-data-from-three-csv-file-sources.html" target="test_blank"&gt;http://answers.splunk.com/answers/232031/how-to-correlate-data-from-three-csv-file-sources.html&lt;/A&gt;&lt;BR /&gt;
&lt;/P&gt;

&lt;P&gt;But i don't know which is the best option.&lt;/P&gt;

&lt;P&gt;Thank you so much for your help.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2015 12:51:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152762#M31097</guid>
      <dc:creator>changux</dc:creator>
      <dc:date>2015-06-10T12:51:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to correlate multiple CSV files using different columns?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152763#M31098</link>
      <description>&lt;P&gt;First: why do you need to "correlate all files to build a big file with relevant information of each file"? And what do you mean by that? In Splunk, you can search across multiple inputs and combine them as you search - you don't normally do this as you ingest the data. Also, you could do it differently for different searches/reports, depending on what you need for each one.&lt;/P&gt;

&lt;P&gt;How can you tell past data from current data? Is there a timestamp? All events in Splunk must have a timestamp - if no other timestamp is provided, Splunk uses the time when the data was indexed. So you can probably just search recent data. You can also decide how to age-out data from your indexes, but that's a topic for another post, when you know more about Splunk.&lt;/P&gt;

&lt;P&gt;Also, if the data is static and not time-based - and you don't care about past values - you could create lookup files instead of indexing the data. Or you might index some of the data and put the rest in lookup files.&lt;/P&gt;

&lt;P&gt;The best option for correlation depends on the searches/reports that you want, and how you have chosen to ingest the data. The community needs a lot more information to answer this.&lt;/P&gt;

&lt;P&gt;Finally, I think that you would benefit greatly from going through the &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.2.3/SearchTutorial/WelcometotheSearchTutorial"&gt;Splunk Tutorial&lt;/A&gt;. You can even get a free &lt;A href="https://www.splunk.com/page/sign_up/cloudtrial?redirecturl=/getsplunk/onlinesandbox"&gt;Splunk Sandbox&lt;/A&gt; to play with, which has the tutorial data in it already. The sandbox is good for 14 days.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jun 2015 00:50:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152763#M31098</guid>
      <dc:creator>lguinn2</dc:creator>
      <dc:date>2015-06-11T00:50:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to correlate multiple CSV files using different columns?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152764#M31099</link>
      <description>&lt;P&gt;If you don't have very many events, you can use &lt;CODE&gt;inputcsv&lt;/CODE&gt; and &lt;CODE&gt;append&lt;/CODE&gt; (has upper limit 10K-50K) and &lt;CODE&gt;transaction&lt;/CODE&gt; (slows down terribly on large datasets) like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;inputcsv File1.csv | append [inputcsv File2.csv] | append [inputcsv File3.csv] | append [inputcsv File4.csv] | append [inputcsv File5.csv] | append [inputcsv File6.csv] | transaction NumberID RegID
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You pretty much have to use &lt;CODE&gt;transaction&lt;/CODE&gt; because it is the only practical way to do a transitive key relationship like you have described.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jun 2015 02:20:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152764#M31099</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-06-11T02:20:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to correlate multiple CSV files using different columns?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152765#M31100</link>
      <description>&lt;P&gt;Thanks so much! Good recommendation.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jun 2015 12:09:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-correlate-multiple-CSV-files-using-different-columns/m-p/152765#M31100</guid>
      <dc:creator>changux</dc:creator>
      <dc:date>2015-06-11T12:09:35Z</dc:date>
    </item>
  </channel>
</rss>

