Getting Data In
Highlighted

Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

Explorer

Hey everybody,

We recently got the request to import CSV files into Splunk. However, the files include some number formatted with German localizations, such that decimal points and commas are exchanged. This is a common number then: 1.340.000,39. Such a number would now be interpreted as String in Splunk.

I saw some guys having similar problems, but not a solution without fixing the problems in search.
http://answers.splunk.com/answers/52965/splunk-and-localization-of-numbers.html
http://answers.splunk.com/answers/42814/decimal-how-to-use-vs.html

I tried my best and implemented some SEDCMD to remove points in numbers and replace commas with points, but it seems like they are getting executed on _raw after the CSV-fields were already excluded.

Is there a way to move the SEDCMDs before CSV-extraction or to make Splunk aware of different-localized numbers?

My props.conf is:

[csv-test]
SEDCMD-1-removePoints = s/([0-9])\.([0-9])/\1\2/g
SEDCMD-2-decimalToPoint = s/([0-9])\,([0-9])/\1.\2/g
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
CHARSET = Latin-1

Thanks for your kind help,
Benjamin

Highlighted

Re: Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

Communicator

Hi Benjamin,

I have the same probleme. At the moment a have to do a rex on every field I want to use and then convert it:
| rex field="FieldA" mode=sed "s/,/./"
| rex field="FieldB" mode=sed "s/,/./"
| convert num(Field*) | eval SumFields=FieldA+FieldB

Perhaps there is at least a way to reduce number of regex' needed, because I have many fields and they vary...

Best regards,
Thomas

Highlighted

Re: Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

Legend

Try removing INDEXED_EXTRACTIONS from your props.conf and field extraction rules to props & transforms

0 Karma
Highlighted

Re: Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

Explorer

That is worth a try.

In that case, however, I need to provide the column headings of my csv for Splunk do recognize the fields, right?
That was my intention to use INDEXED_EXTRACTIONS.

0 Karma
Highlighted

Re: Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

Communicator

Thats a good point. If so, it would not work for me because my field vary much...

0 Karma
Highlighted

Re: Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

Legend

If you choose to removed INDEXED_EXTRACTIONS you will also have to remove the "header" row from _raw and do `multiple field extraction. See here for details/ideas

http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/Createandmaintainsearch-timefieldextract...

0 Karma
Highlighted

Re: Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

Explorer

Thanks for the Link, sundareshr. My approach would be something like the following:

props.conf:
[someSourcetype]
TRANSFORMS-first = NoHeader
TRANSFORMS-second= csv-fieldextraction

transforms.conf:
[NoHeader]
REGEX = "Time","IOPS","Latency(ms)","BW (MBps)"
DEST_KEY = queue
FORMAT = nullQueue

[csv-fieldextraction]
DELIMS=","
FIELDS="Time","IOPS","Latency","BW (MBps)"

Still, the configuration is bond to a specific set of fields. Did you mean that or do you have something different in mind?

0 Karma