<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extracting XML value with &amp;lt;&amp;gt; using regex in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402567#M116501</link>
    <description>&lt;P&gt;you wan't to extract at index time or at search time ?&lt;BR /&gt;
Could you post a complete event for example ?&lt;/P&gt;</description>
    <pubDate>Tue, 23 Jul 2019 07:22:11 GMT</pubDate>
    <dc:creator>thomasroulet</dc:creator>
    <dc:date>2019-07-23T07:22:11Z</dc:date>
    <item>
      <title>Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402564#M116498</link>
      <description>&lt;P&gt;I need to create a regex to match the fieldname for first match and fieldvalue for the second match.&lt;/P&gt;

&lt;P&gt;Issue happens when the field value contains "&amp;lt;" and "&amp;gt;" in the value using the regex I created. example below.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;Recommendation&amp;gt;&amp;lt;![CDATA[&amp;lt;p&amp;gt;&amp;lt;ul&amp;gt;&amp;lt;li&amp;gt;Remove all backup files, binary archives, alternate versions of files, and test files from the web document root of production servers.&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;Amend your deployment policy to include the removal of these file types by an administrator.&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&amp;lt;/p&amp;gt;]]&amp;gt;&amp;lt;/Recommendation&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I am currently using this regex to get the desired result. providing the regex and sample data I am dealing with. &lt;A href="https://regex101.com/r/Pr0Xag/2"&gt;https://regex101.com/r/Pr0Xag/2&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;this is currently the regex I am using.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;([^&amp;gt;]+)&amp;gt;([^&amp;lt;]*)&amp;lt;\/\1&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[xml-extr11]
REGEX = &amp;lt;([^&amp;gt;]+)&amp;gt;([^&amp;lt;]*)&amp;lt;\/\1&amp;gt;
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true

[setnull]
REGEX = &amp;lt;VulnSummary&amp;gt;
DEST_KEY = queue
FORMAT = nullQueue
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[nexpose_appspider]
TRANSFORMS-null= setnull
BREAK_ONLY_BEFORE = &amp;lt;Vuln&amp;gt;
NO_BINARY_CHECK = true
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TIME_PREFIX = &amp;lt;ScanDate&amp;gt;
MAX_TIMESTAMP_LOOKAHEAD = 19
TRUNCATE = 0
disabled = false
pulldown_type = true
REPORT-xmlext11 = xml-extr11
KV_MODE = none
MAX_EVENTS = 400
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 22 Jul 2019 06:15:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402564#M116498</guid>
      <dc:creator>michaelrosello</dc:creator>
      <dc:date>2019-07-22T06:15:27Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402565#M116499</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;you can use this regex tested with your test string on regex101&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;CODE&gt;&amp;lt;([^&amp;gt;]+)&amp;gt;(?|(?=&amp;lt;!\[CDATA\[.*\]\]&amp;gt;)(?|&amp;lt;!\[CDATA\[(?|(.*))\]\]&amp;gt;)|(?|(.*)))&amp;lt;\/\1&amp;gt;&lt;/CODE&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;if you perform the extraction on search time you could use this :&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;`| union &lt;BR /&gt;
    [| makeresults | eval xml="&lt;RECOMMENDATION&gt;&amp;lt;![CDATA[&lt;/RECOMMENDATION&gt;&lt;/P&gt;&lt;P&gt;&lt;UL&gt;&lt;LI&gt;Remove all backup files, binary archives, alternate versions of files, and test files from the web document root of production servers.&lt;/LI&gt;&lt;LI&gt;Amend your deployment policy to include the removal of these file types by an administrator.&lt;/LI&gt;&lt;/UL&gt;&lt;/P&gt;]]&amp;gt;"],&lt;BR /&gt;
    [| makeresults | eval xml="&lt;FINDINGDBID&gt;D9327888CC8545948C8D62D4FF515BDE&lt;/FINDINGDBID&gt;"],&lt;BR /&gt;
    [| makeresults | eval xml="&lt;DESCRIPTION&gt;&amp;lt;![CDATA[&lt;P&gt;A backup file was discovered. Binary archives or application files with an alternate file extension may expose source code and application logic to an attacker. If a script's file extension does not match an application extension (such as .asp, .jsp, or .php), then the server usually considers the file equivalent to plain text. When this happens, the server presents the user with the raw source code of the file instead of executing the script and providing interpreted output.&lt;BR /&gt;Depending on the content of the script file, the exposure of data varies between simple function calls to database connection credentials to administration passwords.&lt;/P&gt;&lt;BR /&gt;File archives such as .tgz, .tar.gz, or .zip files should never be stored within the web application's document root. If these files contain an archive of the application's source code, then it will be trivial for an attacker to download and examine the code.]]&amp;gt;&lt;/DESCRIPTION&gt;"]&lt;P&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;PRE&gt;&lt;CODE&gt;| rex field=xml "&amp;lt;(?&amp;lt;key&amp;gt;[^&amp;gt;]+)&amp;gt;(?|(?=&amp;lt;!\[CDATA\[.*\]\]&amp;gt;)(?|&amp;lt;!\[CDATA\[(?&amp;lt;value&amp;gt;|(.*))\]\]&amp;gt;)|(?&amp;lt;value&amp;gt;|(.*)))&amp;lt;\/\1&amp;gt;"`
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;the important thing is the last line.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2019 13:40:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402565#M116499</guid>
      <dc:creator>thomasroulet</dc:creator>
      <dc:date>2019-07-22T13:40:04Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402566#M116500</link>
      <description>&lt;P&gt;I've tried this and It is working in regex101 but not in Splunk, I suspect because of too many steps?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2019 07:11:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402566#M116500</guid>
      <dc:creator>michaelrosello</dc:creator>
      <dc:date>2019-07-23T07:11:19Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402567#M116501</link>
      <description>&lt;P&gt;you wan't to extract at index time or at search time ?&lt;BR /&gt;
Could you post a complete event for example ?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2019 07:22:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402567#M116501</guid>
      <dc:creator>thomasroulet</dc:creator>
      <dc:date>2019-07-23T07:22:11Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402568#M116502</link>
      <description>&lt;P&gt;in transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[testxml]
SOURCE_KEY = _raw
REGEX = &amp;lt;([^&amp;gt;]+)&amp;gt;(?|(?=&amp;lt;!\[CDATA\[.*\]\]&amp;gt;)(?|&amp;lt;!\[CDATA\[(?|(.*))\]\]&amp;gt;)|(?|(.*)))&amp;lt;\/\1&amp;gt;
FORMAT = $1::$2
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;at search time assuming the data is in _raw:&lt;BR /&gt;
| extract testxml&lt;/P&gt;

&lt;P&gt;at index time :&lt;BR /&gt;
in props.conf &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[yoursourcetype]
TRANSFORMS-testxml = testxml
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 23 Jul 2019 07:52:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402568#M116502</guid>
      <dc:creator>thomasroulet</dc:creator>
      <dc:date>2019-07-23T07:52:40Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402569#M116503</link>
      <description>&lt;P&gt;Here is the complete event, I also update the question with my props and transforms.&lt;BR /&gt;
&lt;A href="https://regex101.com/r/Pr0Xag/6"&gt;https://regex101.com/r/Pr0Xag/6&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2019 10:32:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402569#M116503</guid>
      <dc:creator>michaelrosello</dc:creator>
      <dc:date>2019-07-23T10:32:23Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402570#M116504</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;update your transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[xml-extr11]
REGEX = &amp;lt;([^&amp;gt;]+)&amp;gt;(?|(?=&amp;lt;!\[CDATA\[.*\]\]&amp;gt;)(?|&amp;lt;!\[CDATA\[(?|(.*))\]\]&amp;gt;)|(?|([^&amp;lt;]*)))&amp;lt;\/\1&amp;gt;
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;it will extract the desired fields.&lt;BR /&gt;
I corrected my previous REGEX&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2019 12:20:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402570#M116504</guid>
      <dc:creator>thomasroulet</dc:creator>
      <dc:date>2019-07-23T12:20:10Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402571#M116505</link>
      <description>&lt;P&gt;@michaelrosello &lt;BR /&gt;
did you solve your problem of extraction. &lt;BR /&gt;
Did the answers help ? if this is the case, don't forget to accept an answer and vote.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2019 13:01:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/402571#M116505</guid>
      <dc:creator>thomasroulet</dc:creator>
      <dc:date>2019-09-05T13:01:21Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/540137#M152781</link>
      <description>&lt;P&gt;Hi Thomas,&amp;nbsp;&lt;/P&gt;&lt;P&gt;This regex works for my data too, but it does not work when tag is not closing on same line,&lt;/P&gt;&lt;P&gt;for example&lt;/P&gt;&lt;P&gt;&amp;lt;NETWORK_ID&amp;gt;2020&amp;lt;/NETWORK_ID&amp;gt;&lt;BR /&gt;&amp;lt;NETBIOS&amp;gt;&lt;BR /&gt;&amp;lt;![CDATA[WWWW107]]&amp;gt;&lt;BR /&gt;&amp;lt;/NETBIOS&amp;gt;&lt;BR /&gt;&amp;lt;OS&amp;gt;&lt;BR /&gt;&amp;lt;![CDATA[Windows 2003 R2]]&amp;gt;&lt;BR /&gt;&amp;lt;/OS&amp;gt;&lt;/P&gt;&lt;P&gt;Here, it works for&amp;nbsp;NETWORK_ID tag but does not work for&amp;nbsp;NETBIOS and OS tag.&lt;/P&gt;&lt;P&gt;I have tried when I remove the white spaces from tags, it works.&lt;/P&gt;&lt;P&gt;Can you please suggest here to update the regex accordingly.&lt;/P&gt;&lt;P&gt;Thanks in Advance!!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2021 13:31:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/540137#M152781</guid>
      <dc:creator>phepales</dc:creator>
      <dc:date>2021-02-18T13:31:59Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/540235#M152817</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/231569"&gt;@phepales&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;You can use below regex;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[xml-extr11]
REGEX = &amp;lt;([^&amp;gt;]+)&amp;gt;(?:\n)?(?|(?=&amp;lt;!\[CDATA\[.*\]\]&amp;gt;)(?|&amp;lt;!\[CDATA\[(?|(.*))\]\]&amp;gt;)|(?|([^&amp;lt;]*)))(?:\n)?&amp;lt;\/\1&amp;gt;
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Feb 2021 12:35:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/540235#M152817</guid>
      <dc:creator>scelikok</dc:creator>
      <dc:date>2021-02-17T12:35:40Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/540361#M152872</link>
      <description>&lt;P&gt;Hi &lt;SPAN class=""&gt;&lt;A href="https://community.splunk.com/t5/user/viewprofilepage/user-id/206061" target="_self"&gt;scelikok&lt;/A&gt;&amp;nbsp;&lt;/SPAN&gt;,&lt;/P&gt;&lt;P&gt;Thanks&amp;nbsp;&lt;SPAN class=""&gt;&lt;A href="https://community.splunk.com/t5/user/viewprofilepage/user-id/206061" target="_self"&gt;scelikok&lt;/A&gt;&amp;nbsp;for your help!!!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;actually there are white spaces as you can see in screen shot, regex which you provided is not working in this case. Can you please help me it.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="phepales_0-1613644098358.png" style="width: 400px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/12968i01CFD9F9E1464451/image-size/medium?v=v2&amp;amp;px=400" role="button" title="phepales_0-1613644098358.png" alt="phepales_0-1613644098358.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;lt;NETWORK_ID&amp;gt;2050&amp;lt;/NETWORK_ID&amp;gt;&lt;BR /&gt;&amp;lt;DNS&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;lt;![CDATA[wwwwwwwww93]]&amp;gt;&lt;BR /&gt;&amp;lt;/DNS&amp;gt;&lt;BR /&gt;&amp;lt;NETBIOS&amp;gt;&lt;BR /&gt;&amp;lt;![CDATA[WWWW93]]&amp;gt;&lt;BR /&gt;&amp;lt;/NETBIOS&amp;gt;&lt;BR /&gt;&amp;lt;OS&amp;gt;&lt;BR /&gt;&amp;lt;![CDATA[Windows 2008]]&amp;gt;&lt;BR /&gt;&amp;lt;/OS&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many Thanks in Advance!!&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2021 10:33:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/540361#M152872</guid>
      <dc:creator>phepales</dc:creator>
      <dc:date>2021-02-18T10:33:26Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting XML value with &lt;&gt; using regex</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/540382#M152882</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;A href="https://community.splunk.com/t5/user/viewprofilepage/user-id/206061" target="_self"&gt;scelikok&lt;/A&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;This regex worked for me, maid some changes&lt;/P&gt;&lt;P&gt;&amp;lt;([^&amp;gt;]+)&amp;gt;(?:\n\s*)?(?|(?=&amp;lt;!\[CDATA\[.*\]\]&amp;gt;)(?|&amp;lt;!\[CDATA\[(?|(.*))\]\]&amp;gt;)|(?|([^&amp;lt;]*)))(?:\n\s*)?&amp;lt;\/\1&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!!!&lt;/P&gt;</description>
      <pubDate>Thu, 18 Feb 2021 13:31:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Extracting-XML-value-with-lt-gt-using-regex/m-p/540382#M152882</guid>
      <dc:creator>phepales</dc:creator>
      <dc:date>2021-02-18T13:31:15Z</dc:date>
    </item>
  </channel>
</rss>

