Hey everybody,
We recently got the request to import CSV files into Splunk. However, the files include some number formatted with German localizations, such that decimal points and commas are exchanged. This is a common number then: 1.340.000,39
. Such a number would now be interpreted as String in Splunk.
I saw some guys having similar problems, but not a solution without fixing the problems in search.
http://answers.splunk.com/answers/52965/splunk-and-localization-of-numbers.html
http://answers.splunk.com/answers/42814/decimal-how-to-use-vs.html
I tried my best and implemented some SEDCMD to remove points in numbers and replace commas with points, but it seems like they are getting executed on _raw after the CSV-fields were already excluded.
Is there a way to move the SEDCMDs before CSV-extraction or to make Splunk aware of different-localized numbers?
My props.conf is:
[csv-test]
SEDCMD-1-removePoints = s/([0-9])\.([0-9])/\1\2/g
SEDCMD-2-decimalToPoint = s/([0-9])\,([0-9])/\1.\2/g
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
CHARSET = Latin-1
Thanks for your kind help,
Benjamin
Try removing INDEXED_EXTRACTIONS
from your props.conf
and field extraction rules to props
& transforms
That is worth a try.
In that case, however, I need to provide the column headings of my csv for Splunk do recognize the fields, right?
That was my intention to use INDEXED_EXTRACTIONS.
If you choose to removed INDEXED_EXTRACTIONS
you will also have to remove the "header" row from _raw and do `multiple field extraction. See here for details/ideas
Thanks for the Link, sundareshr. My approach would be something like the following:
props.conf:
[someSourcetype]
TRANSFORMS-first = NoHeader
TRANSFORMS-second= csv-fieldextraction
transforms.conf:
[NoHeader]
REGEX = "Time","IOPS","Latency(ms)","BW (MBps)"
DEST_KEY = queue
FORMAT = nullQueue
[csv-fieldextraction]
DELIMS=","
FIELDS="Time","IOPS","Latency","BW (MBps)"
Still, the configuration is bond to a specific set of fields. Did you mean that or do you have something different in mind?
Thats a good point. If so, it would not work for me because my field vary much...
Hi Benjamin,
I have the same probleme. At the moment a have to do a rex on every field I want to use and then convert it:
| rex field="FieldA" mode=sed "s/,/./"
| rex field="FieldB" mode=sed "s/,/./"
| convert num(Field*) | eval SumFields=FieldA+FieldB
Perhaps there is at least a way to reduce number of regex' needed, because I have many fields and they vary...
Best regards,
Thomas