Why Splunk cannot parse csv file with semicolon as...

Marta88 · ‎09-05-2023

Hi,

I am importing a csv file in Splunk Enterprise that has semicolon as field separator but Splunk does not correctly parses it. For instance this field --> SARL "LE RELAIS DU GEVAUDAN";;;"1 is considered as a whole and is not getting splitted.

Do you know which settings should I configure in the file importer wizard in order to import it?

Thank you

Kind regards

Marta

isoutamo · ‎09-05-2023

Hi

at search time you could use DELIMS with props.conf & transforms.conf

DELIMS = <quoted string list>
* NOTE: This setting is only valid for search-time field extractions.
* IMPORTANT: If a value may contain an embedded unescaped double quote
  character, such as "foo"bar", use REGEX, not DELIMS. An escaped double
  quote (\") is ok. Non-ASCII delimiters also require the use of REGEX.
* Optional. Use DELIMS in place of REGEX when you are working with ASCII-only
  delimiter-based field extractions, where field values (or field/value pairs)
  are separated by delimiters such as colons, spaces, line breaks, and so on.
* Sets delimiter characters, first to separate data into field/value pairs,
  and then to separate field from value.
* Each individual ASCII character in the delimiter string is used as a
  delimiter to split the event.
* Delimiters must be specified within double quotes (eg. DELIMS="|,;").
  Special escape sequences are \t (tab), \n (newline), \r (carriage return),
  \\ (backslash) and \" (double quotes).
* When the event contains full delimiter-separated field/value pairs, you
  enter two sets of quoted characters for DELIMS:
* The first set of quoted delimiters extracts the field/value pairs.
* The second set of quoted delimiters separates the field name from its
  corresponding value.
* When the event only contains delimiter-separated values (no field names),
  use just one set of quoted delimiters to separate the field values. Then use
  the FIELDS setting to apply field names to the extracted values.
  * Alternately, Splunk software reads even tokens as field names and odd
    tokens as field values.
* Splunk software consumes consecutive delimiter characters unless you
  specify a list of field names.
* The following example of DELIMS usage applies to an event where
  field/value pairs are separated by '|' symbols and the field names are
  separated from their corresponding values by '=' symbols:
    [pipe_eq]
    DELIMS = "|", "="
* Default: ""

But on ingesting time you must use REGEX to separate those if needed. Are you sure that you need this on ingest time and search time is not enough?

r. Ismo

Marta88 · ‎09-13-2023

Thank you for your answer. How can I specify a regular expression at ingestion time, in the "add data" wizard?

isoutamo · ‎09-13-2023

This depends in your use case and your environment. If you have Splunk Cloud in use then you can try to use Splunk Edge Processor. That is probably the easiest way to do it? Without Splunk Cloud you can try ingest even or "old way" with props.conf and transforms.conf.

More about this:

Are you absolutely sure that you want extract those fields on index time not on search time?

Why Splunk cannot parse csv file with semicolon as separator?

configuration

using Splunk Enterprise

Get the T-shirt to Prove You Survived Splunk University Bootcamp

Introducing the Splunk Community Dashboard Challenge!

Wondering How to Build Resiliency in the Cloud?