Getting Data In

How to configure line breaking for a CSV file where line breaks are signified by a semicolon?

dablackgoku1234
New Member

Hi everyone,

I have a CSV file where the line breaks are signified by a semicolon ;. I am wondering how one would parse this CSV with the "line break" being a different character. Example:

Number, Score;  1 , 44.5678690273 ;11 , 60.0795233081 ;  14 , 13.6359924845 ;  16 , 44.6169376811 ;  17 , 47.6782506507 ; 

I tried using:

HEADER_FIELD_LINE_NUMBER=1
FIELD_NAMES=Number, Score
BREAK_ONLY_BEFORE=;
CHARSET=AUTO
INDEXED_EXTRACTIONS=csv
KV_MODE=none
LINE_BREAK=;
MUST_BREAK_AFTER=;
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=true
pulldown_type=true

However, it does not break the events at the semicolons.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Try this:

 CHARSET=AUTO
 INDEXED_EXTRACTIONS=csv
 KV_MODE=none
 SHOULD_LINEMERGE=false
 LINE_BREAKER=;
 NO_BINARY_CHECK=true
 pulldown_type=true

Make sure this props.conf is at the source of the data such as the forwarder.

0 Karma

dablackgoku1234
New Member

Without specifying the FIELD_NAMES, I get a No results found. Please change source type, adjust source type settings, or check your source file.

However, specifying FIELD_NAMES still does not parse the semicolons properly. It would think

Score; 1
44.5678690273 ;11
60.0795233081 ; 14

are field values.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Did you change line break to line breaker?

0 Karma

jkat54
SplunkTrust
SplunkTrust

Maybe try a
SEDCMD-semicolon="s/;/\n\r/g"

0 Karma

dablackgoku1234
New Member

Adding SEDCMD-semicolon="s/;/\n\r/g" did not change the result. Do I need to do anything special to enable SEDCMD?

I've added the line to my props.conf file. I've also tried SEDCMD-replace=s/;/\r\n/g and the results are the same.

0 Karma

jkat54
SplunkTrust
SplunkTrust

I believe the indexed_extractions is overriding the sedcmd, and line_breaker.

You will probably have to disable indexed_extractions and use EXTRACT-name to extract the values to field names, and then discard the header with sedcmd

LINE_BREAKER=;
SEDCMD-headerRemoval = s/Number\s+\,\s+Score//g
EXTRACT-fields = ^(?<Number>\d+)\s+\,\s+(?<Score>\d+\.\d+|\d+)  #gets whole integers and factions
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...