We recently got the request to import CSV files into Splunk. However, the files include some number formatted with German localizations, such that decimal points and commas are exchanged. This is a common number then:
1.340.000,39. Such a number would now be interpreted as String in Splunk.
I saw some guys having similar problems, but not a solution without fixing the problems in search.
I tried my best and implemented some SEDCMD to remove points in numbers and replace commas with points, but it seems like they are getting executed on _raw after the CSV-fields were already excluded.
Is there a way to move the SEDCMDs before CSV-extraction or to make Splunk aware of different-localized numbers?
My props.conf is:
[csv-test] SEDCMD-1-removePoints = s/([0-9])\.([0-9])/\1\2/g SEDCMD-2-decimalToPoint = s/([0-9])\,([0-9])/\1.\2/g INDEXED_EXTRACTIONS = csv KV_MODE = none NO_BINARY_CHECK = true SHOULD_LINEMERGE = false CHARSET = Latin-1
Thanks for your kind help,
I have the same probleme. At the moment a have to do a rex on every field I want to use and then convert it:
| rex field="FieldA" mode=sed "s/,/./"
| rex field="FieldB" mode=sed "s/,/./"
| convert num(Field*) | eval SumFields=FieldA+FieldB
Perhaps there is at least a way to reduce number of regex' needed, because I have many fields and they vary...
INDEXED_EXTRACTIONS from your
props.conf and field extraction rules to
That is worth a try.
In that case, however, I need to provide the column headings of my csv for Splunk do recognize the fields, right?
That was my intention to use INDEXED_EXTRACTIONS.
Thats a good point. If so, it would not work for me because my field vary much...
If you choose to removed
INDEXED_EXTRACTIONS you will also have to remove the "header" row from _raw and do `multiple field extraction. See here for details/ideas
Thanks for the Link, sundareshr. My approach would be something like the following:
props.conf: [someSourcetype] TRANSFORMS-first = NoHeader TRANSFORMS-second= csv-fieldextraction transforms.conf: [NoHeader] REGEX = "Time","IOPS","Latency(ms)","BW (MBps)" DEST_KEY = queue FORMAT = nullQueue [csv-fieldextraction] DELIMS="," FIELDS="Time","IOPS","Latency","BW (MBps)"
Still, the configuration is bond to a specific set of fields. Did you mean that or do you have something different in mind?