Getting Data In

How can I exclude XML file headers from being indexed?

jdaves
Path Finder

Hey Splunkers!

I have some XML log files that I'm monitoring on a Windows 2003 server. I got my line/event breaking set up and that's all good. However, I'm getting separate events with the XML header and the master tag that defines the beginning of the file.

<?xml version="1.0" encoding="utf-8" ?>
<log>

I also have events with just in them. I want to send these events to nullQueue but I'm not sure of the exact RegEx syntax I should use. Here is my config:

**PROPS.CONF**
[suckylogs:xml]
TRANSFORMS-nullXmlHeader = chuckXmlHeader

**TRANSFORMS.CONF**
[chuckXmlHeader]
REGEX = (?m)^(<\?xml)|(<log>)|(</log>)
DEST_KEY = queue
FORMAT = nullQueue

I'm not sure if I should have 3 separate capture groups like I have here or one big one.

Any assistance is appreciated! Thanks!

Tags (3)
1 Solution

jdaves
Path Finder

I have the answer already after testing it myself, but I wanted to post this question because I did not see any questions for this specific issue. Here is the proper TRANSFORMS entry with one big capture group with multiple conditions (separated by a pipe):

[chuckXmlHeader]
REGEX = (?m)^(<\?xml|<log>|</log>)
DEST_KEY = queue
FORMAT = nullQueue

If you just want to chuck the XML header because you don't have any other events, then this should work for you:

[chuckXmlHeader]
REGEX = (?m)^(<\?xml)
DEST_KEY = queue
FORMAT = nullQueue

View solution in original post

jdaves
Path Finder

I have the answer already after testing it myself, but I wanted to post this question because I did not see any questions for this specific issue. Here is the proper TRANSFORMS entry with one big capture group with multiple conditions (separated by a pipe):

[chuckXmlHeader]
REGEX = (?m)^(<\?xml|<log>|</log>)
DEST_KEY = queue
FORMAT = nullQueue

If you just want to chuck the XML header because you don't have any other events, then this should work for you:

[chuckXmlHeader]
REGEX = (?m)^(<\?xml)
DEST_KEY = queue
FORMAT = nullQueue
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...