Splunk Search

The xpath command does not work with XML prolog header lines (e.g. <?xml version="1.0"?>)

yeahnah
Motivator

The xpath command does not work if the XML event contains valid prolog header lines (https://www.w3schools.com/xml/xml_syntax.asp).

For example, this works

 

| makeresults
| eval _raw="<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr

 

 but, add a prolog header and it will no longer work ...

 

| makeresults
| eval _raw="<?xml version=\"1.0\?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr

 

I've raised a support case with Splunk about this.

Tags (1)
0 Karma
1 Solution

yeahnah
Motivator

To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead.  Here is a run anywhere example.

| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='EFG'/>
  </System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='HIJ'/>
  </System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*

 

View solution in original post

yeahnah
Motivator

To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead.  Here is a run anywhere example.

| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='EFG'/>
  </System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='HIJ'/>
  </System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*

 

yeahnah
Motivator

On the bug fix for this issue, Splunk Support have come back with the following ...

Observation & Findings:

  1. Thanks for flagging this issue with us and we taken this to the development team.
  2. We informed you that our development team is having high level discussions on the xpath command whether to deprecate it or enhance it.
  3. Once the xpath enhancement or deprecation is done, it will be updated in the official documentation.
  4. As this task will undergo through some pre-checks, post-checks and some approvals which might take some time.

So workarounds are the only option, for now.

Here's a more generic regex to extract different sorts of XML declarations (note, removes CDATA entries too)

| ...
  ``` example: https://regex101.com/r/BqHeX4/3 ```
| eval xml=replace(_raw, "(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*", "")   
| rex mode=sed field=_raw "s/(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*//g"  ``` sed example for a props.conf SEDCMD to remove XML declarations before indexing ```
| xpath ...

 
Finally, there is another bug (Splunk said they are aware) with the xpath command when it is used more than once.  Any existing multi-value fields become non multi-value fields (like a nomv command has been applied) so any mv manipulations should be done before subsequent xpath commands. 

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...