Solved: Indexing multiple .CSV files 4 questions.

ocallender · ‎12-01-2012

Here's my situation.

I have automated a SQL lookup on a database and output a .csv file every 10 minutes with field names and events within the last 10 minutes. eg: eventlog-22112012-1410.csv. Each file is copied to a folder on my Splunk server resulting in an ever growing list of files that Splunk indexes. The first line of each file contains the fields and the rest of the lines are values.

I have been able to get Splunk to index the files as they grow and it automatically extracts the fields. I have built a nice dashboard based on this data.

I have 4 things I really need to ask:

Splunk creates a new source for each file, resulting in a large number of sources in the summary page. Is there a way to have Splunk treat all of the files in the folder as one source?
Splunk is indexing each file header as values. Therefore my results contain events with values such as Customer_Name=Customer_Name, IP_Address=IP_Address where the field name and value are identical for each file. How can I stop this from happening?
Can I delete older files without losing the indexed data? Will Splunk continue to index new files without losing track and re-indexing all the files again?
I want to add new fields to the .csv files going forward. Will Splunk be able to automatically detect the new fields even though they don't exist in the older.csv files that have already been indexed?

Questions 3 and 4 are most important for me. I don't want to delete files or add fields and break the dashboards that I've already created. Please help if you can.

Rob · ‎12-02-2012

Lets see if this helps:

You can rewrite the source field by editing the props and transforms conf files and setting DEST_KEY=MetaData:Source parameter. You might want to take a look at this answer: http://splunk-base.splunk.com/answers/33009/how-to-replace-meta-information
This should be a lot easier with Splunk 5.0. However, you may want to take a look at the docs here: http://docs.splunk.com/Documentation/Splunk/5.0/Data/Extractfieldsfromfileheadersatindextime as that contains several options for checking the CSV header or possible altering your field extractions.
Yes you can. As soon as data is indexed, Splunk no longer needs the original file. Splunk computes a CRC on each file that it indexes using a (as of 5.0) configurable memory value to determine if a file is the same or not. The tricky part here is if the file is identical at the beginning and end for those bytes then Splunk may think that its the same file and the second file will not be indexed. This is why you may want to change the CRC value for indexing very similar files.
Yes, Splunk will be able to detect new fields even if they don't exist in the older files. You can also look in to manually extracting additional fields using either the 'rex' command or editing your config files.

View solution in original post

Rob · ‎12-02-2012

Lets see if this helps:

You can rewrite the source field by editing the props and transforms conf files and setting DEST_KEY=MetaData:Source parameter. You might want to take a look at this answer: http://splunk-base.splunk.com/answers/33009/how-to-replace-meta-information
This should be a lot easier with Splunk 5.0. However, you may want to take a look at the docs here: http://docs.splunk.com/Documentation/Splunk/5.0/Data/Extractfieldsfromfileheadersatindextime as that contains several options for checking the CSV header or possible altering your field extractions.
Yes you can. As soon as data is indexed, Splunk no longer needs the original file. Splunk computes a CRC on each file that it indexes using a (as of 5.0) configurable memory value to determine if a file is the same or not. The tricky part here is if the file is identical at the beginning and end for those bytes then Splunk may think that its the same file and the second file will not be indexed. This is why you may want to change the CRC value for indexing very similar files.
Yes, Splunk will be able to detect new fields even if they don't exist in the older files. You can also look in to manually extracting additional fields using either the 'rex' command or editing your config files.

ocallender · ‎12-02-2012

Thank you very much. I'll try these suggestions and see what happens.

Indexing multiple .CSV files 4 questions.

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation