<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Multi-character delimiters? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17799#M88482</link>
    <description>&lt;P&gt;Can you post a sample event?  As gkanapathy mentioned, you can use a custom field extraction, which can be painful for CSV-like files, especially with quotes.  Another posibility is to use a &lt;CODE&gt;SEDCMD&lt;/CODE&gt; entry to "fix" your events as they are being indexed--which could work if you have a well-defined misuse of double quotes.&lt;/P&gt;</description>
    <pubDate>Thu, 22 Jul 2010 02:37:36 GMT</pubDate>
    <dc:creator>Lowell</dc:creator>
    <dc:date>2010-07-22T02:37:36Z</dc:date>
    <item>
      <title>Multi-character delimiters?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17793#M88476</link>
      <description>&lt;P&gt;I have data coming in in the format "data1","data2","data3" from F5.&lt;/P&gt;

&lt;P&gt;however, some events contain " and some contain , - thus the usual&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;DELIMS = ","
FIELDS = "field1", "field2", "field3"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Doesn't seem to be working 100% of the time.&lt;/P&gt;

&lt;P&gt;If I put &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;DELIMS = "\",\""
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;does it:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;S&gt;force Splunk to look for a "," three character combination to split fields, or &lt;/S&gt;&lt;/LI&gt;
&lt;LI&gt;make a field split every time it finds a " or ,&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;?&lt;/P&gt;

&lt;P&gt;&lt;B&gt;Update: "\",\"" does not work, nor do a few other ideas we tried. I guess this question has become: can Splunk use a multiple-character string as a delimiter?&lt;/B&gt;&lt;/P&gt;

&lt;P&gt;Here is a line of data. This is coming from a F5 ASM:&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;PRE&gt;Jun 18 20:04:34 f5name.client.com ASM:"HTTP protocol compliance failed","f5name.client.com","10.10.10.10","Client_security_policy_1","2010-07-04 12:18:19","","8000003409000000072","","0","Unknown method","HTTP","/cgi-bin/"&amp;gt;alert(12769017.87967)/consumer/homearticle.jsp","","10.10.8.8","ConsumerSite","GET /cgi-bin/%22%3E%3Cscript%3Ealert(12769017.87967)%3C/script%3E/consumer/homearticle.jsp?pageid=Page_ID' onError=alert(12769017.97637) ' HTTP/1.1\r\nHost: host1.client.com\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/20080630 Firefox/3.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 15\r\nConnection: keep-alive\r\nReferer: https://host1.client.com/consumer/site/registration\r\nCookie: IMNAME=/cgi-bin/""&amp;gt;alert(12769017.87967); Partner=; MS_CN=; IDSS=6qjob0U1A/3SCCBYXiwQ6T5WE/EVg==; TS58d302=fb35699ac4c1c0946; MHS_INFO=ObsId=\r\nPragma: no-cache\r\nCache-Control: no-cache\r\n\r\n"&lt;/PRE&gt;

&lt;P&gt;&lt;/P&gt;&lt;HR /&gt;
The error comes after the HTTP field, as the next field starts as /cgi-bin/"&amp;gt;. Splunk takes /cgi-bin/&amp;gt;...Accept: text/html as the field. It drops quotes and grabs everything up to the next unescaped comma.&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2010 06:24:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17793#M88476</guid>
      <dc:creator>Jason</dc:creator>
      <dc:date>2010-07-21T06:24:58Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-character delimiters?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17794#M88477</link>
      <description>&lt;P&gt;I think the character sequence &lt;CODE&gt;\"&lt;/CODE&gt; can be used to escape a closing quote.  But the CSV "standard" uses &lt;CODE&gt;""&lt;/CODE&gt; to escape an inline double-quote.  Unfortunately, I don't think this behavior is user definable, which has been a pain to me in the past.  (Great question, I'm glad you brought it up.  I'm hoping there is a better answer in more recent versions.)&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2010 06:34:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17794#M88477</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2010-07-21T06:34:41Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-character delimiters?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17795#M88478</link>
      <description>&lt;P&gt;We have determined the cause of this is an unescaped " in one of the data fields. Splunk picks up the entire field and ALL fields after it (ignoring commas, because they are quoted?) up until the next unquoted comma. The field shows up in splunk with no embedded "s at all. Bug?&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2010 07:12:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17795#M88478</guid>
      <dc:creator>Jason</dc:creator>
      <dc:date>2010-07-21T07:12:10Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-character delimiters?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17796#M88479</link>
      <description>&lt;P&gt;We tried "\",\"" and "","" - neither works as intended. We need to know if this is possible! Otherwise this is going in a Splunk bug...&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2010 21:32:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17796#M88479</guid>
      <dc:creator>Jason</dc:creator>
      <dc:date>2010-07-21T21:32:45Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-character delimiters?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17797#M88480</link>
      <description>&lt;P&gt;Listing multiple DELIMS characters does not specify a delimiter sequence, but specifies a set of possible single-character delimiters. Using a double-quote as a delimiter is also difficult and a bad idea, since the delimiters are really treated like commas in a CSV file, while the double-quotes usually take on the meaning of double-quotes in CSV.&lt;/P&gt;

&lt;P&gt;If your data isn't conventional CSV or has unescaped characters, it's not really very well defined how it should be treated. In that case, you might consider using a regex instead to define and split your fields.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jul 2010 01:18:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17797#M88480</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-07-22T01:18:47Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-character delimiters?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17798#M88481</link>
      <description>&lt;P&gt;Just to be clear.  What does splunk consider escape characters within the CSV data itself?&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jul 2010 02:35:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17798#M88481</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2010-07-22T02:35:57Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-character delimiters?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17799#M88482</link>
      <description>&lt;P&gt;Can you post a sample event?  As gkanapathy mentioned, you can use a custom field extraction, which can be painful for CSV-like files, especially with quotes.  Another posibility is to use a &lt;CODE&gt;SEDCMD&lt;/CODE&gt; entry to "fix" your events as they are being indexed--which could work if you have a well-defined misuse of double quotes.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jul 2010 02:37:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17799#M88482</guid>
      <dc:creator>Lowell</dc:creator>
      <dc:date>2010-07-22T02:37:36Z</dc:date>
    </item>
    <item>
      <title>Re: Multi-character delimiters?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17800#M88483</link>
      <description>&lt;P&gt;Posted above, it wouldn't let me post all that code as a comment.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jul 2010 05:12:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Multi-character-delimiters/m-p/17800#M88483</guid>
      <dc:creator>Jason</dc:creator>
      <dc:date>2010-07-22T05:12:22Z</dc:date>
    </item>
  </channel>
</rss>

