Splunk Search

regex field extraction

ialahdal
Path Finder

I have an event that is in an HTML tag format, I'd like to extract data within it in a specific manner, as follows:
<TAG1>Splunking</TAG1>

I was trying to extract the data by matching group1 "TAG1" to group2 "/TAG1" and extracting what's in between into a filed named the same as group1, is this possible??

The best I was able to achieve was this <([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>(.*?)<\/\1>
But that doesn't work in nested tags, I also don't know how to assign a filed to a group based on a previous one in splunk.

0 Karma
1 Solution

poete
Builder

Hello @ialahdal,

I think you should use spath in this case (https://docs.splunk.com/Documentation/SplunkCloud/8.0.0/SearchReference/Spath).

Please find below an example of use, with 2 levels of fields in the xml.

| makeresults 
| eval somefield="<level1><someFieldLevel1>someValueLevel1</someFieldLevel1><level2><someFieldLevel2>someValueLevel2</someFieldLevel2></level2></level1>"
| spath input=somefield

View solution in original post

poete
Builder

Hello @ialahdal,

I think you should use spath in this case (https://docs.splunk.com/Documentation/SplunkCloud/8.0.0/SearchReference/Spath).

Please find below an example of use, with 2 levels of fields in the xml.

| makeresults 
| eval somefield="<level1><someFieldLevel1>someValueLevel1</someFieldLevel1><level2><someFieldLevel2>someValueLevel2</someFieldLevel2></level2></level1>"
| spath input=somefield

ialahdal
Path Finder

This helped, thank you.

0 Karma
Get Updates on the Splunk Community!

Celebrating Fast Lane: 2025 Authorized Learning Partner of the Year

At .conf25, Splunk proudly recognized Fast Lane as the 2025 Authorized Learning Partner of the Year. This ...

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...