Getting Data In

Can you help me with a problem extracting XML?

manderson7
Contributor

I've scoured Google and Answers, but my XML looks a little different than most I've seen so far:

 <Doc_OutPut XML_Version="1.0">
      <Doc_Field>
        <Field_Name>BatchName</Field_Name>
<Field_Value>GOCLM36962920190214001_19045SCLM000018</Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>GUID</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>ph_Template</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>phEmp_Template</Field_Name>
        <Field_Value>-Initial – Company</Field_Value>
      </Doc_Field>
      <Doc_Field>
        <Field_Name>phPhy_Template</Field_Name>
        <Field_Value>
        </Field_Value>
      </Doc_Field>
  </Doc_OutPut>

I'd like to get Splunk to display the field_value as the value and field_name as the name of the field. I've tried
props.conf:

DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE = /<Doc_Field/>

What am I doing wrong here?

0 Karma
1 Solution

chrisyounger
SplunkTrust
SplunkTrust

BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do.

To get the fields extracted like you want, You can use this (put it on your search head):

props.conf

[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs

transforms.conf

[my_xml_pairs]
REGEX = <Field_Name>\s*(?<_KEY_1>.*?)\s*<\/Field_Name>.*?<Field_Value>\s*(?<_VAL_1>.*?)\s*<\/Field_Value>.*?

Good luck

View solution in original post

chrisyounger
SplunkTrust
SplunkTrust

BREAK_ONLY_BEFORE is for splitting the data into multiple events so I don't think its what you are trying to do.

To get the fields extracted like you want, You can use this (put it on your search head):

props.conf

[my_sourcetype]
REPORT-my_xml_pairs = my_xml_pairs

transforms.conf

[my_xml_pairs]
REGEX = <Field_Name>\s*(?<_KEY_1>.*?)\s*<\/Field_Name>.*?<Field_Value>\s*(?<_VAL_1>.*?)\s*<\/Field_Value>.*?

Good luck

manderson7
Contributor

Thanks very much, Chris. You're right, I believe I do want all the data in the text doc to show as 1 event.
Unfortunately, this did not extract the field names from the XML, and not all of the fields were in the 1 event. I ingested 1 file and got an event that was 257 lines long, and the rest of the lines were as their own event, and it didn't extract the field names.
I ingested another file of the same type, but I added a \n in between & , but this didn't help w/ the field name extraction. I again got 1 event w/ 257 lines, and the rest of the lines were in their own events.
It worked on regex101, so I'm not sure what happened.
Do you have any ideas what could be the problem?
I also tried adding LINEBREAKER = <\/Doc_OutPut> to the props, no go there either. The events still broke after 257 lines.

0 Karma

chrisyounger
SplunkTrust
SplunkTrust

Using LINE_BREAKER is the best thing to do. If the split works on Regex101 then it should work in Splunk. However two tricks to be aware of:
1. Make sure you put the LINE_BREAKER where the parsing is happening, this usually means the indexer or the first heavy forwarder the data goes through.
2. Make sure you have a "capture group" in your regular expression otherwise it won't work. e.g. LINEBREAKER = \<\/Doc_OutPut\>([\r\n]*)

0 Karma

manderson7
Contributor

LINE_BREAKER did the trick, with the capture group. Didn't know that was required.
Still not getting field names.
props.conf

[ocr_xml]
REPORT-ocr_xml_pairs = ocr_xml_pairs

transforms.conf

[ocr_xml_pairs]
REGEX = `|<Field_Name>\s*(?<Name>.*?)\s*<\/Field_Name>\n.*?<Field_Value>\s*(?<_Value>.*?)\s*<\/Field_Value>.*?
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...