- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to write the regex to extract fields from my sample XML data?
Sample data:
<id>WGBSTH8180T</id>
<sytems>
<sys_Id>14502</sys_Id>
<name>GYS</name>
<version>9901</version>
<ip_address>172.11.11.212</ip_address>
<connector>
<connector_Id>TH818AST001A</connector_Id></connector></sytems>
I can able to get the value WGBSTH8180T
with the regex like | rex field=Info "(?ms)(?.*?)"
Can anyone help me out with how to extract the values sys_Id (systems), name (systems), version (systems), ip_address (systems), and connector_Id (connector) from the data above?
I'm using regex as mentioned below, but its not working. Please help me out to write regex:
| rex field=Info "(?ms)<sytems><sys_Id>(?<sytems>.*?)</sys_Id></sytems>"
| rex field=Info "(?ms)<sytems><name>(?<Name>.*?)</name></sytems>"
| rex field=Info "(?ms)<sytems><version>(?<Version>.*?)</version></sytems>"
| rex field=Info "(?ms)<sytems><ip_address>(?<Ip_Address>.*?)</ip_address></sytems>"
| rex field=Info "(?ms)<sytems><connector><connector_Id>(?<Connector_Ids>.*?)</sys_Id></connector_Id></sytems>"
Thanks in advance
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lots of threads are out there about the topic such as - https://answers.splunk.com/answers/49521/splunk-why-must-xml-sources-be-so-complicated.html
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following regex matches in 66 or less steps.
rex field=info "\<id\>(?<id>[^\<]+)\<[^\<]+\<[^\s]+\>\s+\<sys_Id\>(?<sys_Id>[^\<]+)\<[^\<]+\<(?<name>[^\<]+)\<[^\<]+\<(?<version>[^\<]+)\<[^\<]+\<ip_address\>(?<ip_address>[^\<]+)\<[^\<]+\<[^\s]+\>\s+\<(?<connector_id>[^\<]+)"
All the answers given so far will work. Have poor or non-optimized regex statements for large dataset cause poor search performance. What work well for 10k events doesn't work well 1Billion events.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


There are three reasons the rex commands are not working:
1) The characters between "<sytems>" and the following tag are not accounted for.
2) Slashes must be escaped.
3) The last rex has closing tags in the wrong order.
These rex commands should work, although @jplumsdaine22's answer is better.
| rex field=info "<sys_Id>(?<sytems>.*?)<\/sys_Id>"
| rex field=Info "<name>(?<Name>.*?)<\/name>"
| rex field=Info "<version>(?<Version>.*?)<\/version>"
| rex field=Info "<ip_address>(?<Ip_Address>.*?)<\/ip_address>"
| rex field=Info "<connector_Id>(?<Connector_Ids>.*?)<\/connector_Id>"
If this reply helps you, Karma would be appreciated.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If the log files you are indexing are valid xml, just use the spath
command. See the search reference
http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/Spath
