Splunk Search

The xpath command does not work with XML prolog header lines (e.g. <?xml version="1.0"?>)

yeahnah
Motivator

The xpath command does not work if the XML event contains valid prolog header lines (https://www.w3schools.com/xml/xml_syntax.asp).

For example, this works

 

| makeresults
| eval _raw="<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr

 

 but, add a prolog header and it will no longer work ...

 

| makeresults
| eval _raw="<?xml version=\"1.0\?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr

 

I've raised a support case with Splunk about this.

Tags (1)
0 Karma
1 Solution

yeahnah
Motivator

To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead.  Here is a run anywhere example.

| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='EFG'/>
  </System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='HIJ'/>
  </System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*

 

View solution in original post

yeahnah
Motivator

To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead.  Here is a run anywhere example.

| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='EFG'/>
  </System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='HIJ'/>
  </System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*

 

yeahnah
Motivator

On the bug fix for this issue, Splunk Support have come back with the following ...

Observation & Findings:

  1. Thanks for flagging this issue with us and we taken this to the development team.
  2. We informed you that our development team is having high level discussions on the xpath command whether to deprecate it or enhance it.
  3. Once the xpath enhancement or deprecation is done, it will be updated in the official documentation.
  4. As this task will undergo through some pre-checks, post-checks and some approvals which might take some time.

So workarounds are the only option, for now.

Here's a more generic regex to extract different sorts of XML declarations (note, removes CDATA entries too)

| ...
  ``` example: https://regex101.com/r/BqHeX4/3 ```
| eval xml=replace(_raw, "(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*", "")   
| rex mode=sed field=_raw "s/(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*//g"  ``` sed example for a props.conf SEDCMD to remove XML declarations before indexing ```
| xpath ...

 
Finally, there is another bug (Splunk said they are aware) with the xpath command when it is used more than once.  Any existing multi-value fields become non multi-value fields (like a nomv command has been applied) so any mv manipulations should be done before subsequent xpath commands. 

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...