Getting Data In

Can you help me with a problem extracting XML?

manderson7
Contributor

I've scoured Google and Answers, but my XML looks a little different than most I've seen so far:

 <Doc_OutPut XML_Version="1.0">
      <Doc_Field>
        <Field_Name>BatchName</Field_Name>
<Field_Value>GOCLM36962920190214001_19045SCLM000018</Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>GUID</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>ph_Template</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>phEmp_Template</Field_Name>
        <Field_Value>-Initial – Company</Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>phPhy_Template</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
  </Doc_OutPut>

I'd like to get Splunk to display the field_value as the value and field_name as the name of the field. I've tried
props.conf:

DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE = /<Doc_Field/>

What am I doing wrong here?

0 Karma
1 Solution

chrisyounger
SplunkTrust
SplunkTrust

BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do.

To get the fields extracted like you want, You can use this (put it on your search head):

props.conf

[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs

transforms.conf

[my_xml_pairs]
REGEX = <Field_Name>\s*(?<_KEY_1>.*?)\s*<\/Field_Name>.*?<Field_Value>\s*(?<_VAL_1>.*?)\s*<\/Field_Value>.*?

Good luck

View solution in original post

chrisyounger
SplunkTrust
SplunkTrust

BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do.

To get the fields extracted like you want, You can use this (put it on your search head):

props.conf

[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs

transforms.conf

[my_xml_pairs]
REGEX = <Field_Name>\s*(?<_KEY_1>.*?)\s*<\/Field_Name>.*?<Field_Value>\s*(?<_VAL_1>.*?)\s*<\/Field_Value>.*?

Good luck

manderson7
Contributor

Thanks very much, Chris. You're right, I believe I do want all the data in the text doc to show as 1 event.
Unfortunately, this did not extract the field names from the XML, and not all of the fields were in the 1 event. I ingested 1 file and got an event that was 257 lines long, and the rest of the lines were as their own event, and it didn't extract the field names.
I ingested another file of the same type, but I added a \n in between & , but this didn't help w/ the field name extraction. I again got 1 event w/ 257 lines, and the rest of the lines were in their own events.
It worked on regex101, so I'm not sure what happened.
Do you have any ideas what could be the problem?
I also tried adding LINEBREAKER = <\/Doc_OutPut> to the props, no go there either. The events still broke after 257 lines.

0 Karma

chrisyounger
SplunkTrust
SplunkTrust

Using LINE_BREAKER is the best thing to do. If the split works on Regex101 then it should work in Splunk. However two tricks to be aware of:
1. Make sure you put the LINE_BREAKER where the parsing is happening, this usually means the indexer or the first heavy forwarder the data goes through.
2. Make sure you have a "capture group" in your regular expression otherwise it won't work. e.g. LINEBREAKER = \<\/Doc_OutPut\>([\r\n]*)

0 Karma

manderson7
Contributor

LINE_BREAKER did the trick, with the capture group. Didn't know that was required.
Still not getting field names.
props.conf

[ocr_xml]
REPORT-ocr_xml_pairs = ocr_xml_pairs

transforms.conf

[ocr_xml_pairs]
REGEX = `|<Field_Name>\s*(?<Name>.*?)\s*<\/Field_Name>\n.*?<Field_Value>\s*(?<_Value>.*?)\s*<\/Field_Value>.*?
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...