Getting Data In

Splunk unable to handle double-quoted CSV?

vectorsc
Explorer

Example Line:

"Stuff to be, together as one item",nextvalue,andanother,andso-on

When using splunk auto header detection and DELIMS="," this blows up and shifts the non-quoted fields back, and gives the values:

Field1: "Stuff to be

Field2: together as one item"

Field3: nextvalue

Field4: andanother

Field5: andso-on

Any way around this other than a massive regex? if if you did make a massive regex (we did actually), how would you make it cope with the following:

LINE1: "Stuff to be, together as one item",nextvalue,andanother,andso-on

LINE2: "Stuff to be, together as one item, and this stuff too, and there might be 10 of them...",nextvalue,andanother,andso-on

Tags (1)

dmr195
Communicator

If you don't use the auto header detection (and presumably you stopped using this to try your regex solution) then, as far as I can see, CSV with double quoted fields works fine.

I've got a setup that has the following in props.conf:


[source::prelertanomaly]
CHARSET = UTF-8
MAX_TIMESTAMP_LOOKAHEAD = 10
TIME_FORMAT = %s
REPORT-prelert = prelertanomaly

the following in transforms.conf:


[prelertanomaly]
DELIMS = ","
FIELDS = _time, prelertreferencetime, prelertkey, prelertbucketsize, prelertfieldname, prelertfieldvalue, prelertbaselinerate, prelertcurrentrate, prelertbaselinemean, prelertcurrentmean, sparkline, prelertanomalyfactor, prelertprobability, prelertanomalysearchterms, prelertanomalywhereclause, prelertexamples, prelertmetricfield

and it quite happily handles records of this form:

1362673800,1362673800,unixcpu,300,count,count,3.995050,10,3.995050,10.000000,"##__SPARKLINE__##,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0",0.000000,100.000000,,,,

Notice that the sparkline there has lots of commas in it inside the surrounding double quotes, and Splunk doesn't break it up. With my config and the specific event above Splunk extracted these fields:

host=davids-macbook-pro
sourcetype=stash
source=prelertanomaly
prelertanomalyfactor=0.000000
prelertbaselinerate=3.995050
prelertbucketsize=300
prelertcurrentrate=10
prelertfieldname=count
prelertfieldvalue=count
prelertprobability=100.000000
prelertreferencetime=1362673800
sparkline=##__SPARKLINE__##,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0
time=1362673800
prelertkey=unixcpu
prelertcurrentmean=10.000000
prelertbaselinemean=3.995050

This works with Splunk 4.3.2 and 5.0.1 at least.

One annoyance I have seen with the way Splunk parses CSV input data is that it expects double quotes that are to remain part of the field values to be escaped with backslashes, whereas in the CSV that custom search commands take as input and emit as output double quotes are escaped by having two consecutive double quotes (like Excel does it).

0 Karma

vectorsc
Explorer

oddly, the inputs and such we were using and the methods we used were identical to the way you were doing it. It still didn't work in 4.3.2.

That said, we were able to change the double quoted commas to something else (pipe symbol) and that worked just fine. Most aggravating. I will likely re-attempt this issue at a later date in 5.x

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...