Splunk Search

Help with RegEx - Select only XML nodes that contain values

Explorer

Hello Everyone,

I'm trying to put together a regex statement that will allow me to select only the XML nodes that contain values. In the actual data, there are tons of XML nodes, some may have data, some may not. Instead of defining all of them individually, I'd like to make it more dynamic. My thought was to use a regex in order to select only the nodes that have values, and then use a table * type command to send what's pulled back to a table. If there is a better way to do this using spath or xpath, I'm all ears!

So far, I can achieve what I want to if the XML is on individual lines, using the expression below. The problem is, the XML is streamed, and this expression will not work for streamed XML. I spent a few hours trying to get a working regex to no avail. Any help is greatly appreciated!

Regex that works for XML on individual lines. This will omit empty tags and select all other values.
(.+)>(?![<\/]).+

<root>
<Node1>Value1</Node1>
<Node2>Value2</Node2>
<Node3></Node3>
<Node4></Node4>
<Node5>Value3</Node5>
<Node6>Value4</Node6>
</root>

Actual Data is contained all on one line - Unable to get a regex that can do what is being done above.

<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4></Node4><Node5>Value3</Node5><Node6>Value4</Node6></root>
Tags (2)
0 Karma
1 Solution

Influencer

hi @adcon82,

Use rex command with sed mode to remove nodes without any values. Try this:

| makeresults 
| eval _raw="<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4></Node4><Node5>Value3</Node5><Node6>Value4</Node6></root>" | append [| makeresults 
| eval _raw="<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4>Value4</Node4><Node5></Node5><Node6></Node6></root>" ] 
| rex mode=sed "s/<Node\d+><\/Node\d+>//g"

View solution in original post

0 Karma

Explorer

I found an article that helped me narrow down my table results. Below is the code for reference if anybody comes across this article. I've only ran a few base tests, but so far it seems to narrow my table results to what was returned from the search.

|streamstats count as temp_id
|stats values() by temp_id
|rename values(
) as *
|table *

0 Karma

Explorer

Well, that seemed to perform the substitute as expected. I only get back about a dozen fields that only contain data in my search. The log file itself for this entry contains about 50. However, when I run a "table *" command after the search, it outputs all 50 columns to my table, instead of just the fields showing up on the search. Not sure why it would show the full column list, when my search is omitting these columns.

0 Karma

Influencer

hi @adcon82,

Use rex command with sed mode to remove nodes without any values. Try this:

| makeresults 
| eval _raw="<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4></Node4><Node5>Value3</Node5><Node6>Value4</Node6></root>" | append [| makeresults 
| eval _raw="<root><Node1>Value1</Node1><Node2>Value2</Node2><Node3></Node3><Node4>Value4</Node4><Node5></Node5><Node6></Node6></root>" ] 
| rex mode=sed "s/<Node\d+><\/Node\d+>//g"

View solution in original post

0 Karma

Explorer

Thank you for the response! But what if we don't know the name of the node. Live data in this case contains hundreds of nodes, whose names will vary.

0 Karma

Explorer

I think I figured it out! If you see any issues with this, please let me know.

rex mode=sed "s/<[a-zA-Z]\w+><\/[a-zA-Z]\w+> //g"

0 Karma

Influencer

You can just do this. \w+ matches one or more alpha-numeric characters.

| rex mode=sed "s/<\w+><\/\w+>//g"

Please accept answer if it is working for you.

0 Karma

Explorer

Works great, thanks for the tip. I'll remove the extraneous code.

Now it seems to be a problem with the table command ran after that. I assumed if the search I run after the regex returned only the columns that contained data, I could just run "table *". However, when running it, it's pulling all columns, even the empty ones, total of about 50. This seems strange considering my search is only pulling back the dozen tables that contain data.

0 Karma

Influencer

Yes, since fields are already extracted in index, they'll show up in table even if they are empty in all rows. If you don't want see fields with no values in all rows then use stats:

| stats values(*) as * by _time
0 Karma

Explorer

Thanks so much for all of the help manjunathmeti!!

0 Karma