Getting Data In

How to correlate multiple CSV files using different columns?

changux
Builder

Hi all.

I have almost 6 CSV files extracted from a running system where i can't access the backend to install a forwarder, so, my best option is process the csv output files.

The files, looks like this:

File1.csv = NumberID and almost, 30 columns more.
File2.csv = NumberID, RegID and almost 20 columns more.
File3.csv = RegID and almost 40 columns more.
File4.csv = RegID and almost 20 columns more.
File5.csv = NumberID and almost 5 columns more.
File6.csv = RegID and almost 8 columns more.

I need to correlate all files to build a big file with relevant information of each file (i choose the value columns) based only in the NumberID and RegID but these fields are only present in certain files, so, i need to change the "pattern column" while I finish.

Based on this, i have some questions:

1.) If my csv changes almost 1 time per week, what is the better option to be "ingested" by splunk? I mean, i need to analyze only my last files and not all the history of the records.
2.) How i can do the correlation? I checked other answers like:

http://answers.splunk.com/answers/232031/how-to-correlate-data-from-three-csv-file-sources.html

But i don't know which is the best option.

Thank you so much for your help.

Tags (2)
0 Karma
1 Solution

woodcock
Esteemed Legend

If you don't have very many events, you can use inputcsv and append (has upper limit 10K-50K) and transaction (slows down terribly on large datasets) like this:

inputcsv File1.csv | append [inputcsv File2.csv] | append [inputcsv File3.csv] | append [inputcsv File4.csv] | append [inputcsv File5.csv] | append [inputcsv File6.csv] | transaction NumberID RegID

You pretty much have to use transaction because it is the only practical way to do a transitive key relationship like you have described.

View solution in original post

woodcock
Esteemed Legend

If you don't have very many events, you can use inputcsv and append (has upper limit 10K-50K) and transaction (slows down terribly on large datasets) like this:

inputcsv File1.csv | append [inputcsv File2.csv] | append [inputcsv File3.csv] | append [inputcsv File4.csv] | append [inputcsv File5.csv] | append [inputcsv File6.csv] | transaction NumberID RegID

You pretty much have to use transaction because it is the only practical way to do a transitive key relationship like you have described.

changux
Builder

Thanks so much! Good recommendation.

0 Karma

lguinn2
Legend

First: why do you need to "correlate all files to build a big file with relevant information of each file"? And what do you mean by that? In Splunk, you can search across multiple inputs and combine them as you search - you don't normally do this as you ingest the data. Also, you could do it differently for different searches/reports, depending on what you need for each one.

How can you tell past data from current data? Is there a timestamp? All events in Splunk must have a timestamp - if no other timestamp is provided, Splunk uses the time when the data was indexed. So you can probably just search recent data. You can also decide how to age-out data from your indexes, but that's a topic for another post, when you know more about Splunk.

Also, if the data is static and not time-based - and you don't care about past values - you could create lookup files instead of indexing the data. Or you might index some of the data and put the rest in lookup files.

The best option for correlation depends on the searches/reports that you want, and how you have chosen to ingest the data. The community needs a lot more information to answer this.

Finally, I think that you would benefit greatly from going through the Splunk Tutorial. You can even get a free Splunk Sandbox to play with, which has the tutorial data in it already. The sandbox is good for 14 days.

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...