Splunk Search

The xpath command does not work with XML prolog header lines (e.g. <?xml version="1.0"?>)

yeahnah
Motivator

The xpath command does not work if the XML event contains valid prolog header lines (https://www.w3schools.com/xml/xml_syntax.asp).

For example, this works

 

| makeresults
| eval _raw="<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr

 

 but, add a prolog header and it will no longer work ...

 

| makeresults
| eval _raw="<?xml version=\"1.0\?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr

 

I've raised a support case with Splunk about this.

Tags (1)
0 Karma
1 Solution

yeahnah
Motivator

To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead.  Here is a run anywhere example.

| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='EFG'/>
  </System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='HIJ'/>
  </System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*

 

View solution in original post

yeahnah
Motivator

To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead.  Here is a run anywhere example.

| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='EFG'/>
  </System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='HIJ'/>
  </System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*

 

yeahnah
Motivator

On the bug fix for this issue, Splunk Support have come back with the following ...

Observation & Findings:

  1. Thanks for flagging this issue with us and we taken this to the development team.
  2. We informed you that our development team is having high level discussions on the xpath command whether to deprecate it or enhance it.
  3. Once the xpath enhancement or deprecation is done, it will be updated in the official documentation.
  4. As this task will undergo through some pre-checks, post-checks and some approvals which might take some time.

So workarounds are the only option, for now.

Here's a more generic regex to extract different sorts of XML declarations (note, removes CDATA entries too)

| ...
  ``` example: https://regex101.com/r/BqHeX4/3 ```
| eval xml=replace(_raw, "(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*", "")   
| rex mode=sed field=_raw "s/(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*//g"  ``` sed example for a props.conf SEDCMD to remove XML declarations before indexing ```
| xpath ...

 
Finally, there is another bug (Splunk said they are aware) with the xpath command when it is used more than once.  Any existing multi-value fields become non multi-value fields (like a nomv command has been applied) so any mv manipulations should be done before subsequent xpath commands. 

Get Updates on the Splunk Community!

SplunkTrust Application Period is Officially OPEN!

It's that time, folks! The application/nomination period for the 2025 SplunkTrust is officially open! If you ...

Splunk Answers Content Calendar, June Edition II

Get ready to dive into Splunk Dashboard panels this week! We'll be tackling common questions around ...

Splunk Observability Cloud's AI Assistant in Action Series: Auditing Compliance and ...

This is the third post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...