Archive

Multi-character delimiters?

Motivator

I have data coming in in the format "data1","data2","data3" from F5.

however, some events contain " and some contain , - thus the usual

DELIMS = ","
FIELDS = "field1", "field2", "field3"

Doesn't seem to be working 100% of the time.

If I put

DELIMS = "\",\""

does it:

  • force Splunk to look for a "," three character combination to split fields, or
  • make a field split every time it finds a " or ,

?

Update: "\",\"" does not work, nor do a few other ideas we tried. I guess this question has become: can Splunk use a multiple-character string as a delimiter?

Here is a line of data. This is coming from a F5 ASM:


Jun 18 20:04:34 f5name.client.com ASM:"HTTP protocol compliance failed","f5name.client.com","10.10.10.10","Client_security_policy_1","2010-07-04 12:18:19","","8000003409000000072","","0","Unknown method","HTTP","/cgi-bin/">alert(12769017.87967)/consumer/homearticle.jsp","","10.10.8.8","ConsumerSite","GET /cgi-bin/%22%3E%3Cscript%3Ealert(12769017.87967)%3C/script%3E/consumer/homearticle.jsp?pageid=Page_ID' onError=alert(12769017.97637) ' HTTP/1.1\r\nHost: host1.client.com\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/20080630 Firefox/3.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 15\r\nConnection: keep-alive\r\nReferer: https://host1.client.com/consumer/site/registration\r\nCookie: IMNAME=/cgi-bin/"">alert(12769017.87967); Partner=; MS_CN=; IDSS=6qjob0U1A/3SCCBYXiwQ6T5WE/EVg==; TS58d302=fb35699ac4c1c0946; MHS_INFO=ObsId=\r\nPragma: no-cache\r\nCache-Control: no-cache\r\n\r\n"


The error comes after the HTTP field, as the next field starts as /cgi-bin/">. Splunk takes /cgi-bin/>...Accept: text/html as the field. It drops quotes and grabs everything up to the next unescaped comma.

1 Solution

Splunk Employee
Splunk Employee

Listing multiple DELIMS characters does not specify a delimiter sequence, but specifies a set of possible single-character delimiters. Using a double-quote as a delimiter is also difficult and a bad idea, since the delimiters are really treated like commas in a CSV file, while the double-quotes usually take on the meaning of double-quotes in CSV.

If your data isn't conventional CSV or has unescaped characters, it's not really very well defined how it should be treated. In that case, you might consider using a regex instead to define and split your fields.

View solution in original post

Motivator

Posted above, it wouldn't let me post all that code as a comment.

0 Karma

Super Champion

Can you post a sample event? As gkanapathy mentioned, you can use a custom field extraction, which can be painful for CSV-like files, especially with quotes. Another posibility is to use a SEDCMD entry to "fix" your events as they are being indexed--which could work if you have a well-defined misuse of double quotes.

0 Karma

Splunk Employee
Splunk Employee

Listing multiple DELIMS characters does not specify a delimiter sequence, but specifies a set of possible single-character delimiters. Using a double-quote as a delimiter is also difficult and a bad idea, since the delimiters are really treated like commas in a CSV file, while the double-quotes usually take on the meaning of double-quotes in CSV.

If your data isn't conventional CSV or has unescaped characters, it's not really very well defined how it should be treated. In that case, you might consider using a regex instead to define and split your fields.

View solution in original post

Super Champion

Just to be clear. What does splunk consider escape characters within the CSV data itself?

0 Karma

Motivator

We tried "\",\"" and "","" - neither works as intended. We need to know if this is possible! Otherwise this is going in a Splunk bug...

0 Karma

Motivator

We have determined the cause of this is an unescaped " in one of the data fields. Splunk picks up the entire field and ALL fields after it (ignoring commas, because they are quoted?) up until the next unquoted comma. The field shows up in splunk with no embedded "s at all. Bug?

0 Karma

Super Champion

I think the character sequence \" can be used to escape a closing quote. But the CSV "standard" uses "" to escape an inline double-quote. Unfortunately, I don't think this behavior is user definable, which has been a pain to me in the past. (Great question, I'm glad you brought it up. I'm hoping there is a better answer in more recent versions.)

0 Karma