Getting Data In

How can I exclude XML file headers from being indexed?

jdaves
Path Finder

Hey Splunkers!

I have some XML log files that I'm monitoring on a Windows 2003 server. I got my line/event breaking set up and that's all good. However, I'm getting separate events with the XML header and the master tag that defines the beginning of the file.

<?xml version="1.0" encoding="utf-8" ?>
<log>

I also have events with just in them. I want to send these events to nullQueue but I'm not sure of the exact RegEx syntax I should use. Here is my config:

**PROPS.CONF**
[suckylogs:xml]
TRANSFORMS-nullXmlHeader = chuckXmlHeader

**TRANSFORMS.CONF**
[chuckXmlHeader]
REGEX = (?m)^(<\?xml)|(<log>)|(</log>)
DEST_KEY = queue
FORMAT = nullQueue

I'm not sure if I should have 3 separate capture groups like I have here or one big one.

Any assistance is appreciated! Thanks!

Tags (3)
1 Solution

jdaves
Path Finder

I have the answer already after testing it myself, but I wanted to post this question because I did not see any questions for this specific issue. Here is the proper TRANSFORMS entry with one big capture group with multiple conditions (separated by a pipe):

[chuckXmlHeader]
REGEX = (?m)^(<\?xml|<log>|</log>)
DEST_KEY = queue
FORMAT = nullQueue

If you just want to chuck the XML header because you don't have any other events, then this should work for you:

[chuckXmlHeader]
REGEX = (?m)^(<\?xml)
DEST_KEY = queue
FORMAT = nullQueue

View solution in original post

jdaves
Path Finder

I have the answer already after testing it myself, but I wanted to post this question because I did not see any questions for this specific issue. Here is the proper TRANSFORMS entry with one big capture group with multiple conditions (separated by a pipe):

[chuckXmlHeader]
REGEX = (?m)^(<\?xml|<log>|</log>)
DEST_KEY = queue
FORMAT = nullQueue

If you just want to chuck the XML header because you don't have any other events, then this should work for you:

[chuckXmlHeader]
REGEX = (?m)^(<\?xml)
DEST_KEY = queue
FORMAT = nullQueue
Get Updates on the Splunk Community!

Elevate Your Organization with Splunk’s Next Platform Evolution

 Thursday, July 10, 2025  |  11AM PDT / 2PM EDT Whether you're managing complex deployments or looking to ...

Splunk Answers Content Calendar, June Edition

Get ready for this week’s post dedicated to Splunk Dashboards! We're celebrating the power of community by ...

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...