Getting Data In

XML Parsing in transforms.conf

ta1
Explorer

Hi,

 

I am having some big issues trying to parse certain XML logs into Splunk.

A sample online log which is in the same format as what I see in Splunk _raw logs are as below:

 

<Event><System><Provider Name="Linux-Sysmon" Guid="{ff032593-a8d3-4f13-****-*******}"/><EventID>3</EventID><Version>5</Version><Level>4</Level><Task>3</Task><Opcode>0</Opcode><Keywords>0x8000000000000000</Keywords><TimeCreated SystemTime="2023-11-13T13:34:45.693615000Z"/><EventRecordID>140108</EventRecordID><Correlation/><Execution ProcessID="24493" ThreadID="24493"/><Channel>Linux-Sysmon/Operational</Channel><Computer>computername</Computer><Security UserId="0"/></System><EventData><Data Name="RuleName">-</Data><Data Name="UtcTime">2023-11-13 13:34:45.697</Data><Data Name="ProcessGuid">{ba131d2e-2a52-6550-285f-207366550000}</Data><Data Name="ProcessId">64284</Data><Data Name="Image">/opt/splunkforwarder/bin/splunkd</Data><Data Name="User">root</Data><Data Name="Protocol">tcp</Data><Data Name="Initiated">true</Data><Data Name="SourceIsIpv6">false</Data><Data Name="SourceIp">x.x.x.x</Data><Data Name="SourceHostname">-</Data><Data Name="SourcePort">60164</Data><Data Name="SourcePortName">-</Data><Data Name="DestinationIsIpv6">false</Data><Data Name="DestinationIp">x.x.x.x</Data><Data Name="DestinationHostname">-</Data><Data Name="DestinationPort">8089</Data><Data Name="DestinationPortName">-</Data></EventData></Event>

 

I have in the transforms.conf 

[sysmon-eventid]
REGEX = <EventID>(\d+)</EventID>
FORMAT = EventID::$1

[sysmon-computer]
REGEX = <Computer>(.*?)</Computer>
FORMAT = Computer::$1

[sysmon-data]
REGEX = <Data Name="(.*?)">(.*?)</Data>
FORMAT = $1::$2

 

These are then called in the props.conf with some logic and:

REPORT-sysmon = sysmon-eventID,sysmon-computer,sysmon-data

 

For some reason, the computer field is extracted successfully but not eventID or data name fields. 

I have also tested the regex in regex.101 but not working.

I am not sure if it's the raw logs having issues or something else?

 

Things I have tried:

  • confirmed it is calling the correct sourcetype
  • KV_MODE=xml in props.conf which doesn't parse it properly
  • DATATYPE =xml in props.conf which doesn't work
  • Tried changing the regex to something else but doesn't work
  • tried changing the end of </EventID> to <\/EventID> which did nothing

Not sure what else to try ?

 

Thanks

 

Labels (1)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @ta1 

Please see below some working transforms.conf to try:

livehybrid_0-1755425685717.png

 

[sysmon-eventid]
REGEX   = <EventID>(\d+)</EventID>
FORMAT  = EventID::$1

[sysmon-computer]
REGEX   = <Computer>(.*?)</Computer>
FORMAT  = Computer::$1

[sysmon-data]
# This will extract each key value pair
REGEX   = <Data Name="(?<_KEY_1>[^\"]+)">(?<_VAL_1>[^\<]+)</Data>

Please let me know how you get on:

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

ta1
Explorer

Hi @livehybrid 

Thanks for the help.

I tried it but still doesn't seem to work.

I am using the sysmon for linux add-on and the fields are not extracting besides Computer and Keywords.

https://docs.splunk.com/Documentation/AddOns/released/NixSysmon/About

 

I just can't figure out what I am doing wrong because I can see the logs besides those.

Thanks

0 Karma

PickleRick
SplunkTrust
SplunkTrust

First things first - how do you know they are "not working"? How did you confirm it? These might be very basic questins but you don't wanna know how many times I've seen people just searching in fast mode or filtering out fields...

0 Karma

ta1
Explorer

Hi @livehybrid 

Thanks, sorry that was a typo, fixed it. In my regex I had </EventID>.

It just doesn't seem to extract.

Thanks.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @ta1 

The issue with the EventID is probably because of the incorrect closing tag - you've got <EventID> instead of </EventID>

Let me run some checks on a fixed version of the transforms.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Get Updates on the Splunk Community!

Fall Into Learning with New Splunk Education Courses

Every month, Splunk Education releases new courses to help you branch out, strengthen your data science roots, ...

Super Optimize your Splunk Stats Searches: Unlocking the Power of tstats, TERM, and ...

By Martin Hettervik, Senior Consultant and Team Leader at Accelerate at Iver, Splunk MVPThe stats command is ...

How Splunk Observability Cloud Prevented a Major Payment Crisis in Minutes

Your bank's payment processing system is humming along during a busy afternoon, handling millions in hourly ...