Re: Importing CSV files with numbers formatted in ...

benjaminruland · ‎09-18-2015

Hey everybody,

We recently got the request to import CSV files into Splunk. However, the files include some number formatted with German localizations, such that decimal points and commas are exchanged. This is a common number then: 1.340.000,39. Such a number would now be interpreted as String in Splunk.

I saw some guys having similar problems, but not a solution without fixing the problems in search.
http://answers.splunk.com/answers/52965/splunk-and-localization-of-numbers.html
http://answers.splunk.com/answers/42814/decimal-how-to-use-vs.html

I tried my best and implemented some SEDCMD to remove points in numbers and replace commas with points, but it seems like they are getting executed on _raw after the CSV-fields were already excluded.

Is there a way to move the SEDCMDs before CSV-extraction or to make Splunk aware of different-localized numbers?

My props.conf is:

[csv-test]
SEDCMD-1-removePoints = s/([0-9])\.([0-9])/\1\2/g
SEDCMD-2-decimalToPoint = s/([0-9])\,([0-9])/\1.\2/g
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
CHARSET = Latin-1

Thanks for your kind help,
Benjamin

sundareshr · ‎12-12-2015

Try removing INDEXED_EXTRACTIONS from your props.conf and field extraction rules to props & transforms

benjaminruland · ‎12-21-2015

That is worth a try.

In that case, however, I need to provide the column headings of my csv for Splunk do recognize the fields, right?
That was my intention to use INDEXED_EXTRACTIONS.

sundareshr · ‎12-21-2015

If you choose to removed INDEXED_EXTRACTIONS you will also have to remove the "header" row from _raw and do `multiple field extraction. See here for details/ideas

http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/Createandmaintainsearch-timefieldextract...

benjaminruland · ‎12-21-2015

Thanks for the Link, sundareshr. My approach would be something like the following:

props.conf:
[someSourcetype]
TRANSFORMS-first = NoHeader
TRANSFORMS-second= csv-fieldextraction

transforms.conf:
[NoHeader]
REGEX = "Time","IOPS","Latency(ms)","BW (MBps)"
DEST_KEY = queue
FORMAT = nullQueue

[csv-fieldextraction]
DELIMS=","
FIELDS="Time","IOPS","Latency","BW (MBps)"

Still, the configuration is bond to a specific set of fields. Did you mean that or do you have something different in mind?

goelli · ‎12-21-2015

Thats a good point. If so, it would not work for me because my field vary much...

goelli · ‎12-12-2015

Hi Benjamin,

I have the same probleme. At the moment a have to do a rex on every field I want to use and then convert it:
| rex field="FieldA" mode=sed "s/,/./"
| rex field="FieldB" mode=sed "s/,/./"
| convert num(Field*) | eval SumFields=FieldA+FieldB

Perhaps there is at least a way to reduce number of regex' needed, because I have many fields and they vary...

Best regards,
Thomas

Importing CSV files with numbers formatted in German localization (ex: 1.3400,39), how to configured SEDCMD in props.conf to switch commas and decimal points?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!