How to write the regex to extract fields from my s...

Shan · ‎03-03-2016

Sample data:

             <id>WGBSTH8180T</id>
         <sytems>
         <sys_Id>14502</sys_Id>
         <name>GYS</name>
         <version>9901</version>
         <ip_address>172.11.11.212</ip_address>
         <connector>
         <connector_Id>TH818AST001A</connector_Id></connector></sytems>

I can able to get the value WGBSTH8180T with the regex like | rex field=Info "(?ms)(?.*?)"

Can anyone help me out with how to extract the values sys_Id (systems), name (systems), version (systems), ip_address (systems), and connector_Id (connector) from the data above?

I'm using regex as mentioned below, but its not working. Please help me out to write regex:

     | rex field=Info "(?ms)<sytems><sys_Id>(?<sytems>.*?)</sys_Id></sytems>"
     | rex field=Info "(?ms)<sytems><name>(?<Name>.*?)</name></sytems>"
     | rex field=Info "(?ms)<sytems><version>(?<Version>.*?)</version></sytems>"
     | rex field=Info "(?ms)<sytems><ip_address>(?<Ip_Address>.*?)</ip_address></sytems>"
     | rex field=Info "(?ms)<sytems><connector><connector_Id>(?<Connector_Ids>.*?)</sys_Id></connector_Id></sytems>"

Thanks in advance

ddrillic · ‎03-04-2016

Lots of threads are out there about the topic such as - https://answers.splunk.com/answers/49521/splunk-why-must-xml-sources-be-so-complicated.html

jplumsdaine22 · ‎03-04-2016

Also see this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

bmacias84 · ‎03-03-2016

The following regex matches in 66 or less steps.

rex field=info "\<id\>(?<id>[^\<]+)\<[^\<]+\<[^\s]+\>\s+\<sys_Id\>(?<sys_Id>[^\<]+)\<[^\<]+\<(?<name>[^\<]+)\<[^\<]+\<(?<version>[^\<]+)\<[^\<]+\<ip_address\>(?<ip_address>[^\<]+)\<[^\<]+\<[^\s]+\>\s+\<(?<connector_id>[^\<]+)"

All the answers given so far will work. Have poor or non-optimized regex statements for large dataset cause poor search performance. What work well for 10k events doesn't work well 1Billion events.

richgalloway · ‎03-03-2016

There are three reasons the rex commands are not working:

1) The characters between "<sytems>" and the following tag are not accounted for.
2) Slashes must be escaped.
3) The last rex has closing tags in the wrong order.

These rex commands should work, although @jplumsdaine22's answer is better.

| rex field=info "<sys_Id>(?<sytems>.*?)<\/sys_Id>"
| rex field=Info "<name>(?<Name>.*?)<\/name>"
| rex field=Info "<version>(?<Version>.*?)<\/version>"
| rex field=Info "<ip_address>(?<Ip_Address>.*?)<\/ip_address>"
| rex field=Info "<connector_Id>(?<Connector_Ids>.*?)<\/connector_Id>"

---
If this reply helps you, Karma would be appreciated.

jplumsdaine22 · ‎03-03-2016

If the log files you are indexing are valid xml, just use the spath command. See the search reference

http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/Spath

How to write the regex to extract fields from my sample XML data?

Dashboards: Hiding charts while search is being executed and other uses for tokens

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

Are you a member of the Splunk Community?

How to write the regex to extract fields from my sample XML data?

Dashboards: Hiding charts while search is being executed and other uses for tokens

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

Brains, Bytes, and Boston: Learn from the Best at .conf25