Hi all.
I have almost 6 CSV files extracted from a running system where i can't access the backend to install a forwarder, so, my best option is process the csv output files.
The files, looks like this:
File1.csv = NumberID and almost, 30 columns more.
File2.csv = NumberID, RegID and almost 20 columns more.
File3.csv = RegID and almost 40 columns more.
File4.csv = RegID and almost 20 columns more.
File5.csv = NumberID and almost 5 columns more.
File6.csv = RegID and almost 8 columns more.
I need to correlate all files to build a big file with relevant information of each file (i choose the value columns) based only in the NumberID and RegID but these fields are only present in certain files, so, i need to change the "pattern column" while I finish.
Based on this, i have some questions:
1.) If my csv changes almost 1 time per week, what is the better option to be "ingested" by splunk? I mean, i need to analyze only my last files and not all the history of the records.
2.) How i can do the correlation? I checked other answers like:
http://answers.splunk.com/answers/232031/how-to-correlate-data-from-three-csv-file-sources.html
But i don't know which is the best option.
Thank you so much for your help.
If you don't have very many events, you can use inputcsv
and append
(has upper limit 10K-50K) and transaction
(slows down terribly on large datasets) like this:
inputcsv File1.csv | append [inputcsv File2.csv] | append [inputcsv File3.csv] | append [inputcsv File4.csv] | append [inputcsv File5.csv] | append [inputcsv File6.csv] | transaction NumberID RegID
You pretty much have to use transaction
because it is the only practical way to do a transitive key relationship like you have described.
If you don't have very many events, you can use inputcsv
and append
(has upper limit 10K-50K) and transaction
(slows down terribly on large datasets) like this:
inputcsv File1.csv | append [inputcsv File2.csv] | append [inputcsv File3.csv] | append [inputcsv File4.csv] | append [inputcsv File5.csv] | append [inputcsv File6.csv] | transaction NumberID RegID
You pretty much have to use transaction
because it is the only practical way to do a transitive key relationship like you have described.
Thanks so much! Good recommendation.
First: why do you need to "correlate all files to build a big file with relevant information of each file"? And what do you mean by that? In Splunk, you can search across multiple inputs and combine them as you search - you don't normally do this as you ingest the data. Also, you could do it differently for different searches/reports, depending on what you need for each one.
How can you tell past data from current data? Is there a timestamp? All events in Splunk must have a timestamp - if no other timestamp is provided, Splunk uses the time when the data was indexed. So you can probably just search recent data. You can also decide how to age-out data from your indexes, but that's a topic for another post, when you know more about Splunk.
Also, if the data is static and not time-based - and you don't care about past values - you could create lookup files instead of indexing the data. Or you might index some of the data and put the rest in lookup files.
The best option for correlation depends on the searches/reports that you want, and how you have chosen to ingest the data. The community needs a lot more information to answer this.
Finally, I think that you would benefit greatly from going through the Splunk Tutorial. You can even get a free Splunk Sandbox to play with, which has the tutorial data in it already. The sandbox is good for 14 days.