Getting Data In

Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

benjaminruland
Explorer

Hey everybody,

We recently got the request to import CSV files into Splunk. However, the files include some number formatted with German localizations, such that decimal points and commas are exchanged. This is a common number then: 1.340.000,39. Such a number would now be interpreted as String in Splunk.

I saw some guys having similar problems, but not a solution without fixing the problems in search.
http://answers.splunk.com/answers/52965/splunk-and-localization-of-numbers.html
http://answers.splunk.com/answers/42814/decimal-how-to-use-vs.html

I tried my best and implemented some SEDCMD to remove points in numbers and replace commas with points, but it seems like they are getting executed on _raw after the CSV-fields were already excluded.

Is there a way to move the SEDCMDs before CSV-extraction or to make Splunk aware of different-localized numbers?

My props.conf is:

[csv-test]
SEDCMD-1-removePoints = s/([0-9])\.([0-9])/\1\2/g
SEDCMD-2-decimalToPoint = s/([0-9])\,([0-9])/\1.\2/g
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
CHARSET = Latin-1

Thanks for your kind help,
Benjamin

sundareshr
Legend

Try removing INDEXED_EXTRACTIONS from your props.conf and field extraction rules to props & transforms

0 Karma

benjaminruland
Explorer

That is worth a try.

In that case, however, I need to provide the column headings of my csv for Splunk do recognize the fields, right?
That was my intention to use INDEXED_EXTRACTIONS.

0 Karma

sundareshr
Legend

If you choose to removed INDEXED_EXTRACTIONS you will also have to remove the "header" row from _raw and do `multiple field extraction. See here for details/ideas

http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/Createandmaintainsearch-timefieldextract...

0 Karma

benjaminruland
Explorer

Thanks for the Link, sundareshr. My approach would be something like the following:

props.conf:
[someSourcetype]
TRANSFORMS-first = NoHeader
TRANSFORMS-second= csv-fieldextraction

transforms.conf:
[NoHeader]
REGEX = "Time","IOPS","Latency(ms)","BW (MBps)"
DEST_KEY = queue
FORMAT = nullQueue

[csv-fieldextraction]
DELIMS=","
FIELDS="Time","IOPS","Latency","BW (MBps)"

Still, the configuration is bond to a specific set of fields. Did you mean that or do you have something different in mind?

0 Karma

goelli
Communicator

Thats a good point. If so, it would not work for me because my field vary much...

0 Karma

goelli
Communicator

Hi Benjamin,

I have the same probleme. At the moment a have to do a rex on every field I want to use and then convert it:
| rex field="FieldA" mode=sed "s/,/./"
| rex field="FieldB" mode=sed "s/,/./"
| convert num(Field*) | eval SumFields=FieldA+FieldB

Perhaps there is at least a way to reduce number of regex' needed, because I have many fields and they vary...

Best regards,
Thomas

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!