Hi,
I am having some big issues trying to parse certain XML logs into Splunk.
A sample online log which is in the same format as what I see in Splunk _raw logs are as below:
<Event><System><Provider Name="Linux-Sysmon" Guid="{ff032593-a8d3-4f13-****-*******}"/><EventID>3</EventID><Version>5</Version><Level>4</Level><Task>3</Task><Opcode>0</Opcode><Keywords>0x8000000000000000</Keywords><TimeCreated SystemTime="2023-11-13T13:34:45.693615000Z"/><EventRecordID>140108</EventRecordID><Correlation/><Execution ProcessID="24493" ThreadID="24493"/><Channel>Linux-Sysmon/Operational</Channel><Computer>computername</Computer><Security UserId="0"/></System><EventData><Data Name="RuleName">-</Data><Data Name="UtcTime">2023-11-13 13:34:45.697</Data><Data Name="ProcessGuid">{ba131d2e-2a52-6550-285f-207366550000}</Data><Data Name="ProcessId">64284</Data><Data Name="Image">/opt/splunkforwarder/bin/splunkd</Data><Data Name="User">root</Data><Data Name="Protocol">tcp</Data><Data Name="Initiated">true</Data><Data Name="SourceIsIpv6">false</Data><Data Name="SourceIp">x.x.x.x</Data><Data Name="SourceHostname">-</Data><Data Name="SourcePort">60164</Data><Data Name="SourcePortName">-</Data><Data Name="DestinationIsIpv6">false</Data><Data Name="DestinationIp">x.x.x.x</Data><Data Name="DestinationHostname">-</Data><Data Name="DestinationPort">8089</Data><Data Name="DestinationPortName">-</Data></EventData></Event>
I have in the transforms.conf
[sysmon-eventid]
REGEX = <EventID>(\d+)</EventID>
FORMAT = EventID::$1
[sysmon-computer]
REGEX = <Computer>(.*?)</Computer>
FORMAT = Computer::$1
[sysmon-data]
REGEX = <Data Name="(.*?)">(.*?)</Data>
FORMAT = $1::$2
These are then called in the props.conf with some logic and:
REPORT-sysmon = sysmon-eventID,sysmon-computer,sysmon-data
For some reason, the computer field is extracted successfully but not eventID or data name fields.
I have also tested the regex in regex.101 but not working.
I am not sure if it's the raw logs having issues or something else?
Things I have tried:
Not sure what else to try ?
Thanks
Hi @ta1
Please see below some working transforms.conf to try:
[sysmon-eventid]
REGEX = <EventID>(\d+)</EventID>
FORMAT = EventID::$1
[sysmon-computer]
REGEX = <Computer>(.*?)</Computer>
FORMAT = Computer::$1
[sysmon-data]
# This will extract each key value pair
REGEX = <Data Name="(?<_KEY_1>[^\"]+)">(?<_VAL_1>[^\<]+)</Data>
Please let me know how you get on:
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Hi @livehybrid
Thanks for the help.
I tried it but still doesn't seem to work.
I am using the sysmon for linux add-on and the fields are not extracting besides Computer and Keywords.
https://docs.splunk.com/Documentation/AddOns/released/NixSysmon/About
I just can't figure out what I am doing wrong because I can see the logs besides those.
Thanks
First things first - how do you know they are "not working"? How did you confirm it? These might be very basic questins but you don't wanna know how many times I've seen people just searching in fast mode or filtering out fields...
Hi @livehybrid
Thanks, sorry that was a typo, fixed it. In my regex I had </EventID>.
It just doesn't seem to extract.
Thanks.
Hi @ta1
The issue with the EventID is probably because of the incorrect closing tag - you've got <EventID> instead of </EventID>
Let me run some checks on a fixed version of the transforms.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing