Getting Data In

Send multiple lines to nullQueue from XML file

jgedeon120
Contributor

I'm trying to index an XML file that has multiple lines in the beginning that I do not want or need indexed. I've worked out the regex in RegExr (external online regex testing site) that does select all the unwanted lines. But when I bring the file into Splunk the lines are still indexed. Below are my transforms.conf and props.conf.

props.conf

[sourcetype]
TRANSFORMS-sourcetype_junk = sourcetype_junk
BREAK_ONLY_BEFORE = \<ReportHost
DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD = 0
SHOULD_LINEMERGE = true
TRUNCATE = 0

transforms.conf

[sourcetype_junk]
LOOKAHEAD = 100000
DEST_KEY = queue
REGEX = ^((.|\n|\r)*)\<\/Policy\>
FORMAT = nullQueue

Any ideas how to accomplish this?

Example, everything from the beginning to end of Policy is not needed. There is quite a few more line than what is shown below.:

<?xml version="1.0" ?>
<NessusClientData_v2>
<Policy>
<FamilyItem>
<FamilyName>CentOS Local Security Checks</FamilyName>
<Status>enabled</Status>
</FamilyItem>
<FamilyItem>
<FamilyName>AIX Local Security Checks</FamilyName>
<Status>enabled</Status>
</FamilyItem>
<FamilyItem>
<FamilyName>CISCO</FamilyName>
<Status>enabled</Status>
</FamilyItem>
<FamilyItem><FamilyName>Junos Local Security Checks</FamilyName>
<Status>enabled</Status>
</FamilyItem>
</FamilySelection>
<IndividualPluginSelection>
<PluginItem><PluginId>34220</PluginId>
<PluginName>Netstat Portscanner (WMI)</PluginName>
<Family>Port scanners</Family>
<Status>enabled</Status>
</PluginItem><PluginItem><PluginId>14274</PluginId>
<PluginName>Nessus SNMP Scanner</PluginName>
<Family>Port scanners</Family>
<Status>enabled</Status>
</PluginItem><PluginItem><PluginId>14272</PluginId>
<PluginName>netstat portscanner (SSH)</PluginName>
<Family>Port scanners</Family>
<Status>enabled</Status>
</PluginItem><PluginItem><PluginId>10180</PluginId>
<PluginName>Ping the remote host</PluginName>
<Family>Port scanners</Family>
<Status>enabled</Status>
</PluginItem><PluginItem><PluginId>11219</PluginId>
<PluginName>Nessus SYN scanner</PluginName>
<Family>Port scanners</Family>
<Status>enabled</Status>
</PluginItem></IndividualPluginSelection>
</Policy>
<Report name="ScanNumber2" xmlns:cm="http://www.nessus.org/cm">
<ReportHost name="192.168.1.100"><HostProperties>
<tag name="HOST_END">Sat Feb 25 09:31:53 2012</tag>
<tag name="system-type">general-purpose</tag>
<tag name="operating-system">Microsoft Windows Server 2003 Service Pack 2</tag>
<tag name="mac-address">00:0c:29:2e:7c:68</tag>
<tag name="host-ip">192.168.1.100</tag>
<tag name="host-fqdn">system32.localdomain.com</tag>
<tag name="netbios-name">SYSTEM32</tag>
<tag name="HOST_START">Sat Feb 25 09:20:12 2012</tag>
</HostProperties>

Thanks in adavance,
Joe

WORKING Configurations
props.conf

MAX_EVENTS = 210000
[sourcetype]
TRANSFORMS-sourcetype_junk = sourcetype_junk
BREAK_ONLY_BEFORE = (?m)\<ReportHost\sname
DATETIME_CONFIG = CURRENT
MAX_TIMESTAMP_LOOKAHEAD = 0
SHOULD_LINEMERGE = true
TRUNCATE = 0
BREAK_ONLY_BEFORE_DATE = false

transforms.conf

[sourcetype_junk]
LOOKAHEAD = 10000
DEST_KEY = queue
REGEX = (?m)(^\<\?\bxml.*)
FORMAT = nullQueue

Due to the number of lines in each event the flashtimeline.xml did need to be adjusted with an override to display a larger number of lines in the EventsViewer Module.

Another thank you to MarioM for his assistance with the nullQueue problem.

1 Solution

MarioM
Motivator

did you try with (?m) in front of you regex?

(?m)^((.|\n|\r)*)\<\/Policy\>

As well any nullqueue transforms require splunk restart to be applied.

If it still not working it will be useful to paste here the part of your xml you want to filter.

UPDATE

and with this regex:

(?m)((.*(\r*))+?\<\/Policy\>$)  - **NOT WORKING**

UPDATE 2:

With below confs i got it filtered out

props.conf:

[test_xml]
TRANSFORMS-sourcetype_junk=sourcetype_junk
BREAK_ONLY_BEFORE_DATE=false
BREAK_ONLY_BEFORE=(?m)\<ReportHost\sname
SHOULD_LINEMERGE=true
TRUNCATE=0

transforms.conf:

[sourcetype_junk]
LOOKAHEAD = 10000
DEST_KEY = queue
REGEX = (?m)(^\<\?\bxml.*)
FORMAT = nullQueue

View solution in original post

MarioM
Motivator

did you try with (?m) in front of you regex?

(?m)^((.|\n|\r)*)\<\/Policy\>

As well any nullqueue transforms require splunk restart to be applied.

If it still not working it will be useful to paste here the part of your xml you want to filter.

UPDATE

and with this regex:

(?m)((.*(\r*))+?\<\/Policy\>$)  - **NOT WORKING**

UPDATE 2:

With below confs i got it filtered out

props.conf:

[test_xml]
TRANSFORMS-sourcetype_junk=sourcetype_junk
BREAK_ONLY_BEFORE_DATE=false
BREAK_ONLY_BEFORE=(?m)\<ReportHost\sname
SHOULD_LINEMERGE=true
TRUNCATE=0

transforms.conf:

[sourcetype_junk]
LOOKAHEAD = 10000
DEST_KEY = queue
REGEX = (?m)(^\<\?\bxml.*)
FORMAT = nullQueue

jgedeon120
Contributor

MarioM,

Thank you for your assistance with this. I now have it indexing as I was trying to get it to index. The policy information is not there and all the events are split into ReportHost name events. I can now continue to try and get this productive. Thanks again. I will update my question with my final props and transforms configurations.

MarioM
Motivator

i don't think it will help...for strange reason it work on my conf as per update 2 from my answer

0 Karma

jgedeon120
Contributor

MarioM,

If it would help, use the contact me button in my profile and we can work on a screen share so that this can be figured out. There seems to be a few older posts with people looking for the same thing with no solution.

0 Karma

MarioM
Motivator

i think it's something to do with your line breaking...I am testing it out...

0 Karma

jgedeon120
Contributor

MarioM,

It looks like I spoke too soon and the file was not indexed when I looked. The section I need excluded is still being indexed.

0 Karma

jgedeon120
Contributor

MarioM,

Thank you very much. You figured it out.

Thanks again!

0 Karma

jgedeon120
Contributor

Yes, I have tried the multiline entry (?m). I will try to sanitize a small sample. Currently with just one entry what needs to be filtered out is over 1700 lines long.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...