I have data coming in in the format "data1","data2","data3" from F5.
however, some events contain " and some contain , - thus the usual
DELIMS = ","
FIELDS = "field1", "field2", "field3"
Doesn't seem to be working 100% of the time.
If I put
DELIMS = "\",\""
does it:
?
Update: "\",\"" does not work, nor do a few other ideas we tried. I guess this question has become: can Splunk use a multiple-character string as a delimiter?
Here is a line of data. This is coming from a F5 ASM:
Jun 18 20:04:34 f5name.client.com ASM:"HTTP protocol compliance failed","f5name.client.com","10.10.10.10","Client_security_policy_1","2010-07-04 12:18:19","","8000003409000000072","","0","Unknown method","HTTP","/cgi-bin/">alert(12769017.87967)/consumer/homearticle.jsp","","10.10.8.8","ConsumerSite","GET /cgi-bin/%22%3E%3Cscript%3Ealert(12769017.87967)%3C/script%3E/consumer/homearticle.jsp?pageid=Page_ID' onError=alert(12769017.97637) ' HTTP/1.1\r\nHost: host1.client.com\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/20080630 Firefox/3.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 15\r\nConnection: keep-alive\r\nReferer: https://host1.client.com/consumer/site/registration\r\nCookie: IMNAME=/cgi-bin/"">alert(12769017.87967); Partner=; MS_CN=; IDSS=6qjob0U1A/3SCCBYXiwQ6T5WE/EVg==; TS58d302=fb35699ac4c1c0946; MHS_INFO=ObsId=\r\nPragma: no-cache\r\nCache-Control: no-cache\r\n\r\n"
Listing multiple DELIMS characters does not specify a delimiter sequence, but specifies a set of possible single-character delimiters. Using a double-quote as a delimiter is also difficult and a bad idea, since the delimiters are really treated like commas in a CSV file, while the double-quotes usually take on the meaning of double-quotes in CSV.
If your data isn't conventional CSV or has unescaped characters, it's not really very well defined how it should be treated. In that case, you might consider using a regex instead to define and split your fields.
Posted above, it wouldn't let me post all that code as a comment.
Can you post a sample event? As gkanapathy mentioned, you can use a custom field extraction, which can be painful for CSV-like files, especially with quotes. Another posibility is to use a SEDCMD
entry to "fix" your events as they are being indexed--which could work if you have a well-defined misuse of double quotes.
Listing multiple DELIMS characters does not specify a delimiter sequence, but specifies a set of possible single-character delimiters. Using a double-quote as a delimiter is also difficult and a bad idea, since the delimiters are really treated like commas in a CSV file, while the double-quotes usually take on the meaning of double-quotes in CSV.
If your data isn't conventional CSV or has unescaped characters, it's not really very well defined how it should be treated. In that case, you might consider using a regex instead to define and split your fields.
Just to be clear. What does splunk consider escape characters within the CSV data itself?
We tried "\",\"" and "","" - neither works as intended. We need to know if this is possible! Otherwise this is going in a Splunk bug...
We have determined the cause of this is an unescaped " in one of the data fields. Splunk picks up the entire field and ALL fields after it (ignoring commas, because they are quoted?) up until the next unquoted comma. The field shows up in splunk with no embedded "s at all. Bug?
I think the character sequence \"
can be used to escape a closing quote. But the CSV "standard" uses ""
to escape an inline double-quote. Unfortunately, I don't think this behavior is user definable, which has been a pain to me in the past. (Great question, I'm glad you brought it up. I'm hoping there is a better answer in more recent versions.)