- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have data coming in in the format "data1","data2","data3" from F5.
however, some events contain " and some contain , - thus the usual
DELIMS = ","
FIELDS = "field1", "field2", "field3"
Doesn't seem to be working 100% of the time.
If I put
DELIMS = "\",\""
does it:
force Splunk to look for a "," three character combination to split fields, or- make a field split every time it finds a " or ,
?
Update: "\",\"" does not work, nor do a few other ideas we tried. I guess this question has become: can Splunk use a multiple-character string as a delimiter?
Here is a line of data. This is coming from a F5 ASM:
Jun 18 20:04:34 f5name.client.com ASM:"HTTP protocol compliance failed","f5name.client.com","10.10.10.10","Client_security_policy_1","2010-07-04 12:18:19","","8000003409000000072","","0","Unknown method","HTTP","/cgi-bin/">alert(12769017.87967)/consumer/homearticle.jsp","","10.10.8.8","ConsumerSite","GET /cgi-bin/%22%3E%3Cscript%3Ealert(12769017.87967)%3C/script%3E/consumer/homearticle.jsp?pageid=Page_ID' onError=alert(12769017.97637) ' HTTP/1.1\r\nHost: host1.client.com\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/20080630 Firefox/3.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 15\r\nConnection: keep-alive\r\nReferer: https://host1.client.com/consumer/site/registration\r\nCookie: IMNAME=/cgi-bin/"">alert(12769017.87967); Partner=; MS_CN=; IDSS=6qjob0U1A/3SCCBYXiwQ6T5WE/EVg==; TS58d302=fb35699ac4c1c0946; MHS_INFO=ObsId=\r\nPragma: no-cache\r\nCache-Control: no-cache\r\n\r\n"
The error comes after the HTTP field, as the next field starts as /cgi-bin/">. Splunk takes /cgi-bin/>...Accept: text/html as the field. It drops quotes and grabs everything up to the next unescaped comma.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Listing multiple DELIMS characters does not specify a delimiter sequence, but specifies a set of possible single-character delimiters. Using a double-quote as a delimiter is also difficult and a bad idea, since the delimiters are really treated like commas in a CSV file, while the double-quotes usually take on the meaning of double-quotes in CSV.
If your data isn't conventional CSV or has unescaped characters, it's not really very well defined how it should be treated. In that case, you might consider using a regex instead to define and split your fields.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Posted above, it wouldn't let me post all that code as a comment.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you post a sample event? As gkanapathy mentioned, you can use a custom field extraction, which can be painful for CSV-like files, especially with quotes. Another posibility is to use a SEDCMD
entry to "fix" your events as they are being indexed--which could work if you have a well-defined misuse of double quotes.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Listing multiple DELIMS characters does not specify a delimiter sequence, but specifies a set of possible single-character delimiters. Using a double-quote as a delimiter is also difficult and a bad idea, since the delimiters are really treated like commas in a CSV file, while the double-quotes usually take on the meaning of double-quotes in CSV.
If your data isn't conventional CSV or has unescaped characters, it's not really very well defined how it should be treated. In that case, you might consider using a regex instead to define and split your fields.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to be clear. What does splunk consider escape characters within the CSV data itself?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We tried "\",\"" and "","" - neither works as intended. We need to know if this is possible! Otherwise this is going in a Splunk bug...
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have determined the cause of this is an unescaped " in one of the data fields. Splunk picks up the entire field and ALL fields after it (ignoring commas, because they are quoted?) up until the next unquoted comma. The field shows up in splunk with no embedded "s at all. Bug?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the character sequence \"
can be used to escape a closing quote. But the CSV "standard" uses ""
to escape an inline double-quote. Unfortunately, I don't think this behavior is user definable, which has been a pain to me in the past. (Great question, I'm glad you brought it up. I'm hoping there is a better answer in more recent versions.)
