Hi everyone,
I have a CSV file where the line breaks are signified by a semicolon ;
. I am wondering how one would parse this CSV with the "line break" being a different character. Example:
Number, Score; 1 , 44.5678690273 ;11 , 60.0795233081 ; 14 , 13.6359924845 ; 16 , 44.6169376811 ; 17 , 47.6782506507 ;
I tried using:
HEADER_FIELD_LINE_NUMBER=1
FIELD_NAMES=Number, Score
BREAK_ONLY_BEFORE=;
CHARSET=AUTO
INDEXED_EXTRACTIONS=csv
KV_MODE=none
LINE_BREAK=;
MUST_BREAK_AFTER=;
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
pulldown_type=true
However, it does not break the events at the semicolons.
Try this:
CHARSET=AUTO
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
LINE_BREAKER=;
NO_BINARY_CHECK=true
pulldown_type=true
Make sure this props.conf is at the source of the data such as the forwarder.
Without specifying the FIELD_NAMES, I get a No results found. Please change source type, adjust source type settings, or check your source file.
However, specifying FIELD_NAMES still does not parse the semicolons properly. It would think
Score; 1
44.5678690273 ;11
60.0795233081 ; 14
are field values.
Did you change line break to line breaker?
Maybe try a
SEDCMD-semicolon="s/;/\n\r/g"
Adding SEDCMD-semicolon="s/;/\n\r/g" did not change the result. Do I need to do anything special to enable SEDCMD?
I've added the line to my props.conf file. I've also tried SEDCMD-replace=s/;/\r\n/g and the results are the same.
I believe the indexed_extractions is overriding the sedcmd, and line_breaker.
You will probably have to disable indexed_extractions and use EXTRACT-name to extract the values to field names, and then discard the header with sedcmd
LINE_BREAKER=;
SEDCMD-headerRemoval = s/Number\s+\,\s+Score//g
EXTRACT-fields = ^(?<Number>\d+)\s+\,\s+(?<Score>\d+\.\d+|\d+) #gets whole integers and factions