Splunk Search

The xpath command does not work with XML prolog header lines (e.g. <?xml version="1.0"?>)

yeahnah
Motivator

The xpath command does not work if the XML event contains valid prolog header lines (https://www.w3schools.com/xml/xml_syntax.asp).

For example, this works

 

| makeresults
| eval _raw="<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr

 

 but, add a prolog header and it will no longer work ...

 

| makeresults
| eval _raw="<?xml version=\"1.0\?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr

 

I've raised a support case with Splunk about this.

Tags (1)
0 Karma
1 Solution

yeahnah
Motivator

To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead.  Here is a run anywhere example.

| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='EFG'/>
  </System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='HIJ'/>
  </System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*

 

View solution in original post

yeahnah
Motivator

To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead.  Here is a run anywhere example.

| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
  <System>
    <Provider Name='ABC'/>
  </System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='EFG'/>
  </System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
  <System>
    <Provider Name='HIJ'/>
  </System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*

 

yeahnah
Motivator

On the bug fix for this issue, Splunk Support have come back with the following ...

Observation & Findings:

  1. Thanks for flagging this issue with us and we taken this to the development team.
  2. We informed you that our development team is having high level discussions on the xpath command whether to deprecate it or enhance it.
  3. Once the xpath enhancement or deprecation is done, it will be updated in the official documentation.
  4. As this task will undergo through some pre-checks, post-checks and some approvals which might take some time.

So workarounds are the only option, for now.

Here's a more generic regex to extract different sorts of XML declarations (note, removes CDATA entries too)

| ...
  ``` example: https://regex101.com/r/BqHeX4/3 ```
| eval xml=replace(_raw, "(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*", "")   
| rex mode=sed field=_raw "s/(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*//g"  ``` sed example for a props.conf SEDCMD to remove XML declarations before indexing ```
| xpath ...

 
Finally, there is another bug (Splunk said they are aware) with the xpath command when it is used more than once.  Any existing multi-value fields become non multi-value fields (like a nomv command has been applied) so any mv manipulations should be done before subsequent xpath commands. 

Get Updates on the Splunk Community!

AppDynamics Summer Webinars

This summer, our mighty AppDynamics team is cooking up some delicious content on YouTube Live to satiate your ...

SOCin’ it to you at Splunk University

Splunk University is expanding its instructor-led learning portfolio with dedicated Security tracks at .conf25 ...

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Organizations handling credit card transactions know that PCI DSS compliance is both critical and complex. The ...