Getting Data In

How can I exclude XML file headers from being indexed?

jdaves
Path Finder

Hey Splunkers!

I have some XML log files that I'm monitoring on a Windows 2003 server. I got my line/event breaking set up and that's all good. However, I'm getting separate events with the XML header and the master tag that defines the beginning of the file.

<?xml version="1.0" encoding="utf-8" ?>
<log>

I also have events with just in them. I want to send these events to nullQueue but I'm not sure of the exact RegEx syntax I should use. Here is my config:

**PROPS.CONF**
[suckylogs:xml]
TRANSFORMS-nullXmlHeader = chuckXmlHeader

**TRANSFORMS.CONF**
[chuckXmlHeader]
REGEX = (?m)^(<\?xml)|(<log>)|(</log>)
DEST_KEY = queue
FORMAT = nullQueue

I'm not sure if I should have 3 separate capture groups like I have here or one big one.

Any assistance is appreciated! Thanks!

Tags (3)
1 Solution

jdaves
Path Finder

I have the answer already after testing it myself, but I wanted to post this question because I did not see any questions for this specific issue. Here is the proper TRANSFORMS entry with one big capture group with multiple conditions (separated by a pipe):

[chuckXmlHeader]
REGEX = (?m)^(<\?xml|<log>|</log>)
DEST_KEY = queue
FORMAT = nullQueue

If you just want to chuck the XML header because you don't have any other events, then this should work for you:

[chuckXmlHeader]
REGEX = (?m)^(<\?xml)
DEST_KEY = queue
FORMAT = nullQueue

View solution in original post

jdaves
Path Finder

I have the answer already after testing it myself, but I wanted to post this question because I did not see any questions for this specific issue. Here is the proper TRANSFORMS entry with one big capture group with multiple conditions (separated by a pipe):

[chuckXmlHeader]
REGEX = (?m)^(<\?xml|<log>|</log>)
DEST_KEY = queue
FORMAT = nullQueue

If you just want to chuck the XML header because you don't have any other events, then this should work for you:

[chuckXmlHeader]
REGEX = (?m)^(<\?xml)
DEST_KEY = queue
FORMAT = nullQueue
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...