<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I edit my regex to parse fields correctly if a field delimiter appears within a field? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-edit-my-regex-to-parse-fields-correctly-if-a-field/m-p/238093#M70749</link>
    <description>&lt;P&gt;Try the following:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;^"(?P&amp;lt;dest_ip&amp;gt;[^"]+)","(?P&amp;lt;dest_port&amp;gt;[^"]+)","(?P&amp;lt;uri&amp;gt;[^"]+)","(?P&amp;lt;request&amp;gt;[^"][^,]+)","(?P&amp;lt;response&amp;gt;[^\n]+)"$
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You can test it here: &lt;A href="https://regex101.com/r/nD3sL1/2"&gt;https://regex101.com/r/nD3sL1/2&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 19 Jan 2016 09:21:50 GMT</pubDate>
    <dc:creator>javiergn</dc:creator>
    <dc:date>2016-01-19T09:21:50Z</dc:date>
    <item>
      <title>How do I edit my regex to parse fields correctly if a field delimiter appears within a field?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-edit-my-regex-to-parse-fields-correctly-if-a-field/m-p/238091#M70747</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Another regex problem I'm afraid.....&lt;/P&gt;

&lt;P&gt;I've got a very long event with 37 fields where all the fields are quoted and separated by comma. Also there are no &lt;CODE&gt;key=value&lt;/CODE&gt; pairs. &lt;BR /&gt;
For the most part my regex works nicely with the event data, but there are occasions where a quote also appears in the actual field data thereby breaking my regex separator character.&lt;/P&gt;

&lt;P&gt;Working example (extremely simplified regex and event):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;^"(?P&amp;lt;dest_ip&amp;gt;[^"]+)","(?P&amp;lt;dest_port&amp;gt;[^"]+)","(?P&amp;lt;uri&amp;gt;[^"]+)","(?P&amp;lt;request&amp;gt;[^"]+)","(?P&amp;lt;response&amp;gt;[^\n]+)"$
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Data:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"192.0.0.20","80","fl=city,name,code,group=true&amp;amp;group.field=city","GET /solr/lpbm/select?fl=city","Logging rate limit reached"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;No problem with this, all the fields parse out OK. However, this next event fails - note the additional &lt;CODE&gt;"&lt;/CODE&gt; in fourth field:-&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;"192.0.0.20","80","fl=city,name,code,group=true&amp;amp;group.field=city","GET /solr/"lpbm"/select?fl=city","Logging rate limit reached"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This now breaks the &lt;CODE&gt;[^"]+)","&lt;/CODE&gt; part of my regex and distorts the field extractions.&lt;/P&gt;

&lt;P&gt;Is there a way to do the equivalent of:- &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;......","(?P&amp;lt;request&amp;gt;[^","]+)",".......
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I know that this is invalid, but I don't know what the alternative looks like &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;  !!&lt;/P&gt;

&lt;P&gt;Thanks for any help,&lt;BR /&gt;
Mark.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 07:08:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-edit-my-regex-to-parse-fields-correctly-if-a-field/m-p/238091#M70747</guid>
      <dc:creator>markwymer</dc:creator>
      <dc:date>2016-01-19T07:08:46Z</dc:date>
    </item>
    <item>
      <title>Re: How do I edit my regex to parse fields correctly if a field delimiter appears within a field?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-edit-my-regex-to-parse-fields-correctly-if-a-field/m-p/238092#M70748</link>
      <description>&lt;P&gt;Your problem should be solvable by using non greedy (or lazy) quantifiers instead of the &lt;CODE&gt;[^"]&lt;/CODE&gt; syntax. The advantage is, that you can use the whole pattern &lt;CODE&gt;","&lt;/CODE&gt; as seperator instead of just &lt;CODE&gt;[^"]&lt;/CODE&gt;. How ever, I'm not sure if the Splunk RegEx works as I expect to do, but try (something like) this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;^"(?P&amp;lt;dest_ip&amp;gt;.+?)","(?P&amp;lt;dest_port&amp;gt;.+?)","(?P&amp;lt;uri&amp;gt;.+?)","(?P&amp;lt;request&amp;gt;.+?)","(?P&amp;lt;response&amp;gt;[^\n]+)"$
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;What's the difference:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;I'd say the &lt;CODE&gt;[^"]&lt;/CODE&gt; syntax is "old school". The parser is consuming just everything until an &lt;CODE&gt;"&lt;/CODE&gt; is found.&lt;/LI&gt;
&lt;LI&gt;Lazy quantifiers, how ever, parse as much as they can. And "as much" means: As much as possible unless the whole pattern doesn't match. In theory this should (I can't test that right now) therefore consume a single &lt;CODE&gt;"&lt;/CODE&gt; but no &lt;CODE&gt;","&lt;/CODE&gt; as the pattern would no longer match as a whole. (And it should be a little bit slower, again, in theory)&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;/edit &amp;amp; just as info: a &lt;CODE&gt;?&lt;/CODE&gt; makes an quantifier lazy (here: &lt;CODE&gt;.+?&lt;/CODE&gt;: "Consume lazy at least one character").&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 07:52:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-edit-my-regex-to-parse-fields-correctly-if-a-field/m-p/238092#M70748</guid>
      <dc:creator>Sebastian2</dc:creator>
      <dc:date>2016-01-19T07:52:08Z</dc:date>
    </item>
    <item>
      <title>Re: How do I edit my regex to parse fields correctly if a field delimiter appears within a field?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-edit-my-regex-to-parse-fields-correctly-if-a-field/m-p/238093#M70749</link>
      <description>&lt;P&gt;Try the following:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;^"(?P&amp;lt;dest_ip&amp;gt;[^"]+)","(?P&amp;lt;dest_port&amp;gt;[^"]+)","(?P&amp;lt;uri&amp;gt;[^"]+)","(?P&amp;lt;request&amp;gt;[^"][^,]+)","(?P&amp;lt;response&amp;gt;[^\n]+)"$
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;You can test it here: &lt;A href="https://regex101.com/r/nD3sL1/2"&gt;https://regex101.com/r/nD3sL1/2&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 09:21:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-edit-my-regex-to-parse-fields-correctly-if-a-field/m-p/238093#M70749</guid>
      <dc:creator>javiergn</dc:creator>
      <dc:date>2016-01-19T09:21:50Z</dc:date>
    </item>
  </channel>
</rss>

