The xpath command does not work if the XML event contains valid prolog header lines (https://www.w3schools.com/xml/xml_syntax.asp).
For example, this works
| makeresults
| eval _raw="<Event>
<System>
<Provider Name='ABC'/>
</System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr
but, add a prolog header and it will no longer work ...
| makeresults
| eval _raw="<?xml version=\"1.0\?>
<Event>
<System>
<Provider Name='ABC'/>
</System>
</Event>"
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| table _raw raw_provider_name_attr
I've raised a support case with Splunk about this.
To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead. Here is a run anywhere example.
| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
<System>
<Provider Name='ABC'/>
</System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
<System>
<Provider Name='EFG'/>
</System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
<System>
<Provider Name='HIJ'/>
</System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*
To workaround this issue, remove the valid XML prolog headers from the event before calling the xpath command, or use the spath command instead. Here is a run anywhere example.
| makeresults
| eval _raw="<?xml version\"1.0\"?>
<Event>
<System>
<Provider Name='ABC'/>
</System>
</Event>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
<System>
<Provider Name='EFG'/>
</System>
</Event>
<?xml version\"1.0\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<Event>
<System>
<Provider Name='HIJ'/>
</System>
</Event>"
| eval xml=replace(_raw, "<(\?xml|!DOCTYPE).+?>[\r\n]*", "")
| xpath field=_raw outfield=raw_provider_name_attr "//Provider/@Name"
| xpath field=xml outfield=xml_provider_name_attr "//Provider/@Name"
| spath output=spath_provider_name_attr Event.System{2}.Provider{@Name}
| table _raw raw_provider_name_attr xml* spath*
On the bug fix for this issue, Splunk Support have come back with the following ...
Observation & Findings:
- Thanks for flagging this issue with us and we taken this to the development team.
- We informed you that our development team is having high level discussions on the xpath command whether to deprecate it or enhance it.
- Once the xpath enhancement or deprecation is done, it will be updated in the official documentation.
- As this task will undergo through some pre-checks, post-checks and some approvals which might take some time.
So workarounds are the only option, for now.
Here's a more generic regex to extract different sorts of XML declarations (note, removes CDATA entries too)
| ...
``` example: https://regex101.com/r/BqHeX4/3 ```
| eval xml=replace(_raw, "(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*", "")
| rex mode=sed field=_raw "s/(?s)(\<[\?\!]([^\\>]+\>).+?)*(?=\<[^(?=\/)])(?=[a-zA-Z])*//g" ``` sed example for a props.conf SEDCMD to remove XML declarations before indexing ```
| xpath ...
Finally, there is another bug (Splunk said they are aware) with the xpath command when it is used more than once. Any existing multi-value fields become non multi-value fields (like a nomv command has been applied) so any mv manipulations should be done before subsequent xpath commands.