Getting Data In

Search time field extractions of structured data in csv format

Path Finder

Hello,

I am indexing data which arrives to the index in csv format.
I am using a search time filed extraction method. I have specified a list of the fields in the transforms.conf
What will happen in a new column gets added to a csv file or the order of columns changes? I can change a transforms.conf file by modifying the fields list, but the new transform would not work for the csv data before column order has changed.

What is the best method for csv files fields extraction assuming the order of columns can change in the future?

Thank you.

0 Karma

Esteemed Legend

The best that you can do is WATCH for it, then fix it. Here is what you do. In every CSV RegEx, Add (?:,(?<FIXME_EXPANSION>[^,]+))?. Then have a search with FIXME_EXPANSION=* that runs all the time and emails you if the results are ever non-zero.

SplunkTrust
SplunkTrust

For CSV-like data, DELIMS work pretty well. Take a look at this for example:
https://www.splunk.com/blog/2013/03/11/quick-n-dirty-delimited-data-sourcetypes-and-you.html

However, if your data changes its format, that might be problematic. If the new column gets appended last, it might work just defining more fields in your transforms.
Basically, when your data changes its format, you should ingest it with a different custom sourcetype that fits your data. 😉

0 Karma

Path Finder

Thank you for your answer. If I modify the sourcetype to fit the new data format then i won't be able to search the data in previous format properly. Unless i can apply multiple sourcetypes depending on the time range the data is stored for.

0 Karma