I have a CSV file where the line breaks are signified by a semicolon
;. I am wondering how one would parse this CSV with the "line break" being a different character. Example:
Number, Score; 1 , 44.5678690273 ;11 , 60.0795233081 ; 14 , 13.6359924845 ; 16 , 44.6169376811 ; 17 , 47.6782506507 ;
I tried using:
HEADER_FIELD_LINE_NUMBER=1 FIELD_NAMES=Number, Score BREAK_ONLY_BEFORE=; CHARSET=AUTO INDEXED_EXTRACTIONS=csv KV_MODE=none LINE_BREAK=; MUST_BREAK_AFTER=; NO_BINARY_CHECK=true SHOULD_LINEMERGE=true pulldown_type=true
However, it does not break the events at the semicolons.
CHARSET=AUTO INDEXED_EXTRACTIONS=csv KV_MODE=none SHOULD_LINEMERGE=false LINE_BREAKER=; NO_BINARY_CHECK=true pulldown_type=true
Make sure this props.conf is at the source of the data such as the forwarder.
Without specifying the FIELD_NAMES, I get a
No results found. Please change source type, adjust source type settings, or check your source file.
However, specifying FIELD_NAMES still does not parse the semicolons properly. It would think
60.0795233081 ; 14
are field values.
Adding SEDCMD-semicolon="s/;/\n\r/g" did not change the result. Do I need to do anything special to enable SEDCMD?
I've added the line to my props.conf file. I've also tried SEDCMD-replace=s/;/\r\n/g and the results are the same.
I believe the indexedextractions is overriding the sedcmd, and linebreaker.
You will probably have to disable indexed_extractions and use EXTRACT-name to extract the values to field names, and then discard the header with sedcmd
LINE_BREAKER=; SEDCMD-headerRemoval = s/Number\s+\,\s+Score//g EXTRACT-fields = ^(?<Number>\d+)\s+\,\s+(?<Score>\d+\.\d+|\d+) #gets whole integers and factions