<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Can you help me with a problem extracting XML? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398668#M71070</link>
    <description>&lt;P&gt;I've scoured Google and Answers, but my XML looks a little different than most I've seen so far:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; &amp;lt;Doc_OutPut XML_Version="1.0"&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;BatchName&amp;lt;/Field_Name&amp;gt;
&amp;lt;Field_Value&amp;gt;GOCLM36962920190214001_19045SCLM000018&amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;GUID&amp;lt;/Field_Name&amp;gt;
        &amp;lt;Field_Value&amp;gt;
        &amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;ph_Template&amp;lt;/Field_Name&amp;gt;
        &amp;lt;Field_Value&amp;gt;
        &amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;phEmp_Template&amp;lt;/Field_Name&amp;gt;
        &amp;lt;Field_Value&amp;gt;-Initial – Company&amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;phPhy_Template&amp;lt;/Field_Name&amp;gt;
        &amp;lt;Field_Value&amp;gt;
        &amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
  &amp;lt;/Doc_OutPut&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I'd like to get Splunk to display the field_value as the value and field_name as the name of the field. I've tried &lt;BR /&gt;
props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE = /&amp;lt;Doc_Field/&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;What am I doing wrong here?&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 23:21:25 GMT</pubDate>
    <dc:creator>manderson7</dc:creator>
    <dc:date>2020-09-29T23:21:25Z</dc:date>
    <item>
      <title>Can you help me with a problem extracting XML?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398668#M71070</link>
      <description>&lt;P&gt;I've scoured Google and Answers, but my XML looks a little different than most I've seen so far:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; &amp;lt;Doc_OutPut XML_Version="1.0"&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;BatchName&amp;lt;/Field_Name&amp;gt;
&amp;lt;Field_Value&amp;gt;GOCLM36962920190214001_19045SCLM000018&amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;GUID&amp;lt;/Field_Name&amp;gt;
        &amp;lt;Field_Value&amp;gt;
        &amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;ph_Template&amp;lt;/Field_Name&amp;gt;
        &amp;lt;Field_Value&amp;gt;
        &amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;phEmp_Template&amp;lt;/Field_Name&amp;gt;
        &amp;lt;Field_Value&amp;gt;-Initial – Company&amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
      &amp;lt;Doc_Field&amp;gt;
        &amp;lt;Field_Name&amp;gt;phPhy_Template&amp;lt;/Field_Name&amp;gt;
        &amp;lt;Field_Value&amp;gt;
        &amp;lt;/Field_Value&amp;gt;
      &amp;lt;/Doc_Field&amp;gt;
  &amp;lt;/Doc_OutPut&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I'd like to get Splunk to display the field_value as the value and field_name as the name of the field. I've tried &lt;BR /&gt;
props.conf:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE = /&amp;lt;Doc_Field/&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;What am I doing wrong here?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 23:21:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398668#M71070</guid>
      <dc:creator>manderson7</dc:creator>
      <dc:date>2020-09-29T23:21:25Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me with a problem extracting XML?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398669#M71071</link>
      <description>&lt;P&gt;BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do. &lt;/P&gt;

&lt;P&gt;To get the fields extracted like you want, You can use this (put it on your search head):&lt;/P&gt;

&lt;P&gt;props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[my_xml_pairs]
REGEX = &amp;lt;Field_Name&amp;gt;\s*(?&amp;lt;_KEY_1&amp;gt;.*?)\s*&amp;lt;\/Field_Name&amp;gt;.*?&amp;lt;Field_Value&amp;gt;\s*(?&amp;lt;_VAL_1&amp;gt;.*?)\s*&amp;lt;\/Field_Value&amp;gt;.*?
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Good luck&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 23:26:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398669#M71071</guid>
      <dc:creator>chrisyounger</dc:creator>
      <dc:date>2020-09-29T23:26:25Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me with a problem extracting XML?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398670#M71072</link>
      <description>&lt;P&gt;Thanks very much, Chris. You're right, I believe I do want all the data in the text doc to show as 1 event.&lt;BR /&gt;
Unfortunately, this did not extract the field names from the XML, and not all of the fields were in the 1 event. I ingested 1 file and got an event that was 257 lines long, and the rest of the lines were as their own event, and it didn't extract the field names.&lt;BR /&gt;
I ingested another file of the same type, but I added a \n in between  &amp;amp; , but this didn't help w/ the field name extraction. I again got 1 event w/ 257 lines, and the rest of the lines were in their own events.&lt;BR /&gt;
It worked on regex101, so I'm not sure what happened.&lt;BR /&gt;
Do you have any ideas what could be the problem?&lt;BR /&gt;
I also tried adding LINEBREAKER = &amp;lt;\/Doc_OutPut&amp;gt; to the props, no go there either. The events still broke after 257 lines.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Feb 2019 20:53:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398670#M71072</guid>
      <dc:creator>manderson7</dc:creator>
      <dc:date>2019-02-26T20:53:20Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me with a problem extracting XML?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398671#M71073</link>
      <description>&lt;P&gt;Using LINE_BREAKER is the best thing to do.  If the split works on Regex101 then it should work in Splunk. However two tricks to be aware of:&lt;BR /&gt;
1. Make sure you put the LINE_BREAKER where the parsing is happening, this usually means the indexer or the first heavy forwarder the data goes through.&lt;BR /&gt;
2. Make sure you have a "capture group" in your regular expression otherwise it won't work.  e.g. &lt;CODE&gt;LINEBREAKER = \&amp;lt;\/Doc_OutPut\&amp;gt;([\r\n]*)&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 23:26:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398671#M71073</guid>
      <dc:creator>chrisyounger</dc:creator>
      <dc:date>2020-09-29T23:26:31Z</dc:date>
    </item>
    <item>
      <title>Re: Can you help me with a problem extracting XML?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398672#M71074</link>
      <description>&lt;P&gt;LINE_BREAKER did the trick, with the capture group. Didn't know that was required.&lt;BR /&gt;
Still not getting field names. &lt;BR /&gt;
props.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[ocr_xml]
REPORT-ocr_xml_pairs = ocr_xml_pairs
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;transforms.conf&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[ocr_xml_pairs]
REGEX = `|&amp;lt;Field_Name&amp;gt;\s*(?&amp;lt;Name&amp;gt;.*?)\s*&amp;lt;\/Field_Name&amp;gt;\n.*?&amp;lt;Field_Value&amp;gt;\s*(?&amp;lt;_Value&amp;gt;.*?)\s*&amp;lt;\/Field_Value&amp;gt;.*?
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 26 Feb 2019 22:01:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Can-you-help-me-with-a-problem-extracting-XML/m-p/398672#M71074</guid>
      <dc:creator>manderson7</dc:creator>
      <dc:date>2019-02-26T22:01:27Z</dc:date>
    </item>
  </channel>
</rss>

