<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: [Regex/Extraction] Need help finding the correct method of parsing a specific log type in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Regex-Extraction-Need-help-finding-the-correct-method-of-parsing/m-p/420049#M120766</link>
    <description>&lt;P&gt;Like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | rex field=_raw max_match=0 "\s?(?P&amp;lt;test&amp;gt;[A-Za-z\-\_]+\=.*?)(?=\s+[^\s=]+=|$)"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Tue, 23 Apr 2019 05:18:42 GMT</pubDate>
    <dc:creator>woodcock</dc:creator>
    <dc:date>2019-04-23T05:18:42Z</dc:date>
    <item>
      <title>[Regex/Extraction] Need help finding the correct method of parsing a specific log type</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Regex-Extraction-Need-help-finding-the-correct-method-of-parsing/m-p/420048#M120765</link>
      <description>&lt;P&gt;Instead of trying to explain, It would be easier to show you the problem I am having. The Splunk search below will give you two example anonymized logs  that I am trying to parse correctly and entirely:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults count=2 
| streamstats count 
| eval _raw = if(count=1,"f=?q?&amp;lt;bounces+51612-7668-random=53user=3Dqilsadkjerwqs.com@email.eb-notifications.com&amp;gt;: t=&amp;lt;random_user@idwgdzfctcbgmzk.com&amp;gt; Rule=?q?Globally_Allowed_Senders type=Providencia b=ok action=deliver scot=242 PROBLEM-FIELD=HELP extract this field entirely(1) don_data=?q?255.255.255.255;bounces+321200-4020-hob=1Adagu=2Rzoipxoantxhnonw.com@email.eb-notifications.com;q2.email.eb-notifications.com p=0.025 S=?q?COY_REPORTS_Has_Created_a_New_Item_in_HvsulQjoc fur=255.255.255.255 r=255.255.255.255 pz=4.20 a=a/art", "t=&amp;lt;random_user@rigjgaxwiaizady.com&amp;gt; PROBLEM-FIELD=HELP extract this field entirely(2) Rule=?q?Arnita_Sargita_Sender_IP S=Oj: fur=255.255.255.255") 
| fields - _time count 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I am using the following regex to try to extract the fields:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; | rex field=_raw max_match=0 "\s?(?P&amp;lt;test&amp;gt;[A-Za-z\-\_]+\=[^\s]+)" 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The problem I am having is specifically with the "PROBLEM-FIELD" in both logs. Extracted fully, the PROBLEM-FIELD/value pair should be:&lt;/P&gt;

&lt;P&gt;PROBLEM-FIELD=&lt;STRONG&gt;HELP extract this field entirely(1)&lt;/STRONG&gt;&lt;BR /&gt;
but it is showing up as:&lt;BR /&gt;
PROBLEM-FIELD=&lt;STRONG&gt;HELP&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;because there are spaces in the PROBLEM-FIELD value, unlike the other fields in the data.&lt;/P&gt;

&lt;P&gt;Originally I tried to use the &lt;CODE&gt;extract&lt;/CODE&gt; command with &lt;CODE&gt;kvdelim="=" pairdelim=" "&lt;/CODE&gt;, but because there are equal signs(=) within some of the field's values, it doesn't work. If anyone has any ideas on how to parse this log with &lt;STRONG&gt;any method&lt;/STRONG&gt;, without losing any data, please help!&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Non-essential bonus question:&lt;/STRONG&gt; Is there a way to use the &lt;CODE&gt;extract&lt;/CODE&gt; command with this data, without using &lt;CODE&gt;mvexpand&lt;/CODE&gt;? The method below will work if a regex is found that will extract the PROBLEM-FIELD correctly, but I lose all the other fields I'm working with when I have to use stats to join the fields back together (not to mention it is terribly inefficient and ugly):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults count=2 
| streamstats count 
| eval _raw = if(count=1,"f=?q?&amp;lt;bounces+51612-7668-random=53user=3Dqilsadkjerwqs.com@email.eb-notifications.com&amp;gt;: t=&amp;lt;random_user@idwgdzfctcbgmzk.com&amp;gt; Rule=?q?Globally_Allowed_Senders type=Providencia b=ok action=deliver scot=242 PROBLEM-FIELD=HELP extract this field entirely(1) don_data=?q?255.255.255.255;bounces+321200-4020-hob=1Adagu=2Rzoipxoantxhnonw.com@email.eb-notifications.com;q2.email.eb-notifications.com p=0.025 S=?q?COY_REPORTS_Has_Created_a_New_Item_in_HvsulQjoc fur=255.255.255.255 r=255.255.255.255 pz=4.20 a=a/art", "t=&amp;lt;random_user@rigjgaxwiaizady.com&amp;gt; PROBLEM-FIELD=HELP extract this field entirely(2) Rule=?q?Arnita_Sargita_Sender_IP S=Oj: fur=255.255.255.255") 
| fields - _time count 
| streamstats count AS log_recompiler 
| rex field=_raw max_match=0 "\s?(?P&amp;lt;test&amp;gt;[A-Za-z\-\_]+\=[^\s]+)" 
| mvexpand test 
| rex field=test "(?P&amp;lt;field&amp;gt;[^\=]+\=)(?P&amp;lt;value&amp;gt;.*)" 
| rex mode=sed field=field "s/=/~/g" 
| eval newfield = mvzip(field,value) 
| stats list(newfield) AS _raw by log_recompiler 
| eval _raw = toString(_raw) 
| rex field=_raw mode=sed "s/=/|||/g" 
| extract kvdelim="~," pairdelim=" " 
| foreach * 
    [ rex field=&amp;lt;&amp;lt;FIELD&amp;gt;&amp;gt; mode=sed "s/\|\|\|/=/g"] 
| fields - _raw
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Apr 2019 23:02:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Regex-Extraction-Need-help-finding-the-correct-method-of-parsing/m-p/420048#M120765</guid>
      <dc:creator>rbechtold</dc:creator>
      <dc:date>2019-04-18T23:02:53Z</dc:date>
    </item>
    <item>
      <title>Re: [Regex/Extraction] Need help finding the correct method of parsing a specific log type</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Regex-Extraction-Need-help-finding-the-correct-method-of-parsing/m-p/420049#M120766</link>
      <description>&lt;P&gt;Like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | rex field=_raw max_match=0 "\s?(?P&amp;lt;test&amp;gt;[A-Za-z\-\_]+\=.*?)(?=\s+[^\s=]+=|$)"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 23 Apr 2019 05:18:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Regex-Extraction-Need-help-finding-the-correct-method-of-parsing/m-p/420049#M120766</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2019-04-23T05:18:42Z</dc:date>
    </item>
    <item>
      <title>Re: [Regex/Extraction] Need help finding the correct method of parsing a specific log type</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Regex-Extraction-Need-help-finding-the-correct-method-of-parsing/m-p/420050#M120767</link>
      <description>&lt;P&gt;You're incredible! It took me a few minutes to wrap my mind around how the extraction works, but you translated my problem perfectly into regex. I have a lot to learn when it comes to forwards/backwards lookups. Thank you so much!&lt;/P&gt;

&lt;P&gt;In the event anyone runs across this in the future and is curious about the second part of my question, I've been able to figure it out using this method:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults count=2 
| streamstats count 
| eval _raw = if(count=1,"f=?q?: t= Rule=?q?Globally_Allowed_Senders type=Providencia b=ok action=deliver scot=242 PROBLEM-FIELD=HELP extract this field entirely(1) don_data=?q?255.255.255.255;bounces+321200-4020-hob=1Adagu=2Rzoipxoantxhnonw.com@email.eb-notifications.com;q2.email.eb-notifications.com p=0.025 S=?q?COY_REPORTS_Has_Created_a_New_Item_in_HvsulQjoc fur=255.255.255.255 r=255.255.255.255 pz=4.20 a=a/art", "t= PROBLEM-FIELD=HELP extract this field entirely(2) Rule=?q?Arnita_Sargita_Sender_IP S=Oj: fur=255.255.255.255") 
| fields - _time count 
| rex field=_raw max_match=0 "\s?(?P&amp;lt;test&amp;gt;[A-Za-z\-\_]+\=.*?)(?=\s+[^\s=]+=|$)" 
| rex field=test max_match=0 "(?P&amp;lt;field1&amp;gt;[^\=]+)\=(?P&amp;lt;field2&amp;gt;.*)" 
| eval field1 = mvjoin(field1,","), field2 = mvjoin(field2,"~,") 
| eval field1 = split(field1, ","), field2 = split(field2, ",") 
| rename _raw AS tempraw 
| eval _raw = mvzip(field1, field2) 
| rex field=_raw mode=sed "s/=/|||/g" 
| extract kvdelim="," pairdelim="~" mv_add=t 
| foreach * 
    [ rename &amp;lt;&amp;lt;FIELD&amp;gt;&amp;gt; AS &amp;lt;&amp;lt;FIELD&amp;gt;&amp;gt;_temp 
    | rex field=&amp;lt;&amp;lt;FIELD&amp;gt;&amp;gt;_temp mode=sed "s/\|\|\|/=/g" 
    | rename &amp;lt;&amp;lt;FIELD&amp;gt;&amp;gt;_temp AS &amp;lt;&amp;lt;FIELD&amp;gt;&amp;gt;] 
| fields - field1 field2 test 
| rename tempraw AS _raw
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Thank you again Woodcock.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Apr 2019 17:56:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Regex-Extraction-Need-help-finding-the-correct-method-of-parsing/m-p/420050#M120767</guid>
      <dc:creator>rbechtold</dc:creator>
      <dc:date>2019-04-25T17:56:21Z</dc:date>
    </item>
  </channel>
</rss>

