Splunk Search

Help with RegEx - Select only XML nodes that contain values

adcon82
Explorer

Hello Everyone,

I'm trying to put together a regex statement that will allow me to select only the XML nodes that contain values. In the actual data, there are tons of XML nodes, some may have data, some may not. Instead of defining all of them individually, I'd like to make it more dynamic. My thought was to use a regex in order to select only the nodes that have values, and then use a table * type command to send what's pulled back to a table. If there is a better way to do this using spath or xpath, I'm all ears!

So far, I can achieve what I want to if the XML is on individual lines, using the expression below. The problem is, the XML is streamed, and this expression will not work for streamed XML. I spent a few hours trying to get a working regex to no avail. Any help is greatly appreciated!

Regex that works for XML on individual lines. This will omit empty tags and select all other values.
(.+)>(?![<\/]).+

<root>
<Node1>Value1</Node1>
<Node2>Value2</Node2>
<Node3></Node3>
<Node4></Node4>
<Node5>Value3</Node5>
<Node6>Value4</Node6>
</root>

Actual Data is contained all on one line - Unable to get a regex that can do what is being done above.

<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4></Node4><Node5>Value3</Node5><Node6>Value4</Node6></root>
Tags (2)
0 Karma
1 Solution

manjunathmeti
Champion

hi @adcon82,

Use rex command with sed mode to remove nodes without any values. Try this:

| makeresults 
| eval _raw="<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4></Node4><Node5>Value3</Node5><Node6>Value4</Node6></root>" | append [| makeresults 
| eval _raw="<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4>Value4</Node4><Node5></Node5><Node6></Node6></root>" ] 
| rex mode=sed "s/<Node\d+><\/Node\d+>//g"

View solution in original post

0 Karma

adcon82
Explorer

I found an article that helped me narrow down my table results. Below is the code for reference if anybody comes across this article. I've only ran a few base tests, but so far it seems to narrow my table results to what was returned from the search.

|streamstats count as temp_id
|stats values() by temp_id
|rename values(
) as *
|table *

0 Karma

adcon82
Explorer

Well, that seemed to perform the substitute as expected. I only get back about a dozen fields that only contain data in my search. The log file itself for this entry contains about 50. However, when I run a "table *" command after the search, it outputs all 50 columns to my table, instead of just the fields showing up on the search. Not sure why it would show the full column list, when my search is omitting these columns.

0 Karma

manjunathmeti
Champion

hi @adcon82,

Use rex command with sed mode to remove nodes without any values. Try this:

| makeresults 
| eval _raw="<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4></Node4><Node5>Value3</Node5><Node6>Value4</Node6></root>" | append [| makeresults 
| eval _raw="<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4>Value4</Node4><Node5></Node5><Node6></Node6></root>" ] 
| rex mode=sed "s/<Node\d+><\/Node\d+>//g"
0 Karma

adcon82
Explorer

Thank you for the response! But what if we don't know the name of the node. Live data in this case contains hundreds of nodes, whose names will vary.

0 Karma

adcon82
Explorer

I think I figured it out! If you see any issues with this, please let me know.

rex mode=sed "s/<[a-zA-Z]\w+><\/[a-zA-Z]\w+> //g"

0 Karma

manjunathmeti
Champion

You can just do this. \w+ matches one or more alpha-numeric characters.

| rex mode=sed "s/<\w+><\/\w+>//g"

Please accept answer if it is working for you.

0 Karma

adcon82
Explorer

Works great, thanks for the tip. I'll remove the extraneous code.

Now it seems to be a problem with the table command ran after that. I assumed if the search I run after the regex returned only the columns that contained data, I could just run "table *". However, when running it, it's pulling all columns, even the empty ones, total of about 50. This seems strange considering my search is only pulling back the dozen tables that contain data.

0 Karma

manjunathmeti
Champion

Yes, since fields are already extracted in index, they'll show up in table even if they are empty in all rows. If you don't want see fields with no values in all rows then use stats:

| stats values(*) as * by _time
0 Karma

adcon82
Explorer

Thanks so much for all of the help manjunathmeti!!

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...