Getting Data In

XML Parsing in transforms.conf

ta1
Explorer

Hi,

 

I am having some big issues trying to parse certain XML logs into Splunk.

A sample online log which is in the same format as what I see in Splunk _raw logs are as below:

 

<Event><System><Provider Name="Linux-Sysmon" Guid="{ff032593-a8d3-4f13-****-*******}"/><EventID>3</EventID><Version>5</Version><Level>4</Level><Task>3</Task><Opcode>0</Opcode><Keywords>0x8000000000000000</Keywords><TimeCreated SystemTime="2023-11-13T13:34:45.693615000Z"/><EventRecordID>140108</EventRecordID><Correlation/><Execution ProcessID="24493" ThreadID="24493"/><Channel>Linux-Sysmon/Operational</Channel><Computer>computername</Computer><Security UserId="0"/></System><EventData><Data Name="RuleName">-</Data><Data Name="UtcTime">2023-11-13 13:34:45.697</Data><Data Name="ProcessGuid">{ba131d2e-2a52-6550-285f-207366550000}</Data><Data Name="ProcessId">64284</Data><Data Name="Image">/opt/splunkforwarder/bin/splunkd</Data><Data Name="User">root</Data><Data Name="Protocol">tcp</Data><Data Name="Initiated">true</Data><Data Name="SourceIsIpv6">false</Data><Data Name="SourceIp">x.x.x.x</Data><Data Name="SourceHostname">-</Data><Data Name="SourcePort">60164</Data><Data Name="SourcePortName">-</Data><Data Name="DestinationIsIpv6">false</Data><Data Name="DestinationIp">x.x.x.x</Data><Data Name="DestinationHostname">-</Data><Data Name="DestinationPort">8089</Data><Data Name="DestinationPortName">-</Data></EventData></Event>

 

I have in the transforms.conf 

[sysmon-eventid]
REGEX = <EventID>(\d+)</EventID>
FORMAT = EventID::$1

[sysmon-computer]
REGEX = <Computer>(.*?)</Computer>
FORMAT = Computer::$1

[sysmon-data]
REGEX = <Data Name="(.*?)">(.*?)</Data>
FORMAT = $1::$2

 

These are then called in the props.conf with some logic and:

REPORT-sysmon = sysmon-eventID,sysmon-computer,sysmon-data

 

For some reason, the computer field is extracted successfully but not eventID or data name fields. 

I have also tested the regex in regex.101 but not working.

I am not sure if it's the raw logs having issues or something else?

 

Things I have tried:

  • confirmed it is calling the correct sourcetype
  • KV_MODE=xml in props.conf which doesn't parse it properly
  • DATATYPE =xml in props.conf which doesn't work
  • Tried changing the regex to something else but doesn't work
  • tried changing the end of </EventID> to <\/EventID> which did nothing

Not sure what else to try ?

 

Thanks

 

Labels (1)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @ta1 

Please see below some working transforms.conf to try:

livehybrid_0-1755425685717.png

 

[sysmon-eventid]
REGEX   = <EventID>(\d+)</EventID>
FORMAT  = EventID::$1

[sysmon-computer]
REGEX   = <Computer>(.*?)</Computer>
FORMAT  = Computer::$1

[sysmon-data]
# This will extract each key value pair
REGEX   = <Data Name="(?<_KEY_1>[^\"]+)">(?<_VAL_1>[^\<]+)</Data>

Please let me know how you get on:

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

ta1
Explorer

Hi @livehybrid 

Thanks for the help.

I tried it but still doesn't seem to work.

I am using the sysmon for linux add-on and the fields are not extracting besides Computer and Keywords.

https://docs.splunk.com/Documentation/AddOns/released/NixSysmon/About

 

I just can't figure out what I am doing wrong because I can see the logs besides those.

Thanks

0 Karma

PickleRick
SplunkTrust
SplunkTrust

First things first - how do you know they are "not working"? How did you confirm it? These might be very basic questins but you don't wanna know how many times I've seen people just searching in fast mode or filtering out fields...

0 Karma

ta1
Explorer

Hi @livehybrid 

Thanks, sorry that was a typo, fixed it. In my regex I had </EventID>.

It just doesn't seem to extract.

Thanks.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @ta1 

The issue with the EventID is probably because of the incorrect closing tag - you've got <EventID> instead of </EventID>

Let me run some checks on a fixed version of the transforms.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...