Example Line:
"Stuff to be, together as one item",nextvalue,andanother,andso-on
When using splunk auto header detection and DELIMS="," this blows up and shifts the non-quoted fields back, and gives the values:
Field1: "Stuff to be
Field2: together as one item"
Field3: nextvalue
Field4: andanother
Field5: andso-on
Any way around this other than a massive regex? if if you did make a massive regex (we did actually), how would you make it cope with the following:
LINE1: "Stuff to be, together as one item",nextvalue,andanother,andso-on
LINE2: "Stuff to be, together as one item, and this stuff too, and there might be 10 of them...",nextvalue,andanother,andso-on
If you don't use the auto header detection (and presumably you stopped using this to try your regex solution) then, as far as I can see, CSV with double quoted fields works fine.
I've got a setup that has the following in props.conf:
[source::prelertanomaly]
CHARSET = UTF-8
MAX_TIMESTAMP_LOOKAHEAD = 10
TIME_FORMAT = %s
REPORT-prelert = prelertanomaly
the following in transforms.conf:
[prelertanomaly]
DELIMS = ","
FIELDS = _time, prelertreferencetime, prelertkey, prelertbucketsize, prelertfieldname, prelertfieldvalue, prelertbaselinerate, prelertcurrentrate, prelertbaselinemean, prelertcurrentmean, sparkline, prelertanomalyfactor, prelertprobability, prelertanomalysearchterms, prelertanomalywhereclause, prelertexamples, prelertmetricfield
and it quite happily handles records of this form:
1362673800,1362673800,unixcpu,300,count,count,3.995050,10,3.995050,10.000000,"##__SPARKLINE__##,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0",0.000000,100.000000,,,,
Notice that the sparkline there has lots of commas in it inside the surrounding double quotes, and Splunk doesn't break it up. With my config and the specific event above Splunk extracted these fields:
host=davids-macbook-pro
sourcetype=stash
source=prelertanomaly
prelertanomalyfactor=0.000000
prelertbaselinerate=3.995050
prelertbucketsize=300
prelertcurrentrate=10
prelertfieldname=count
prelertfieldvalue=count
prelertprobability=100.000000
prelertreferencetime=1362673800
sparkline=##__SPARKLINE__##,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0
time=1362673800
prelertkey=unixcpu
prelertcurrentmean=10.000000
prelertbaselinemean=3.995050
This works with Splunk 4.3.2 and 5.0.1 at least.
One annoyance I have seen with the way Splunk parses CSV input data is that it expects double quotes that are to remain part of the field values to be escaped with backslashes, whereas in the CSV that custom search commands take as input and emit as output double quotes are escaped by having two consecutive double quotes (like Excel does it).
oddly, the inputs and such we were using and the methods we used were identical to the way you were doing it. It still didn't work in 4.3.2.
That said, we were able to change the double quoted commas to something else (pipe symbol) and that worked just fine. Most aggravating. I will likely re-attempt this issue at a later date in 5.x