Splunk Search

help with regex for windows event log

jbandautrgv
Engager

I'm trying to parse out data from an event log in xml format. I'm posting an example of two logs that are coming from the same eveng log (same sourcetype):

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <System><Provider Name='Microsoft-Windows-Base-Filtering-Engine-Connections' Guid='{121D3DA8-BAF1-4DCB-929F-2D4C9A47F7AB}'/>
    <EventID>2000</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2020-03-23T14:23:44.982049900Z'/>
    <EventRecordID>1238530</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1252' ThreadID='11720'/>
    <Channel>microsoft-windows-base-filtering-engine-connections/operational</Channel>
    <Computer>servername.fqdn</Computer>
    <Security UserID='S-1-5-19'/>
  </System>
  <EventData>
    <Data Name='ConnectionId'>13228601961099160992</Data>
    <Data Name='MachineAuthenticationMethod'>4</Data>
    <Data Name='RemoteMachineAccount'>machine.fqdn</Data>
    <Data Name='UserAuthenticationMethod'>2</Data>
    <Data Name='RemoteUserAcount'>domain\user</Data>
    <Data Name='RemoteIPAddress'>ipv6address</Data>
    <Data Name='LocalIPAddress'>ipv6address</Data>
    <Data Name='TechnologyProviderKey'>{1BEBC969-61A5-4732-A177-847A0817862A}</Data>
    <Data Name='IPsecTrafficMode'>1</Data>
    <Data Name='DHGroup'>0</Data>
    <Data Name='StartTime'>2020-03-23T14:23:44.969Z</Data>
  </EventData>
</Event>

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <System>
    <Provider Name='Microsoft-Windows-Base-Filtering-Engine-Connections' Guid='{121D3DA8-BAF1-4DCB-929F-2D4C9A47F7AB}'/>
    <EventID>2001</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2020-03-24T02:53:43.017501900Z'/>
    <EventRecordID>1284675</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1252' ThreadID='7796'/>
    <Channel>microsoft-windows-base-filtering-engine-connections/operational</Channel>
    <Computer>servername.fqdn</Computer>
    <Security UserID='S-1-5-19'/>
  </System>
  <EventData>
    <Data Name='ConnectionId'>13228601961099183464</Data>
    <Data Name='MachineAuthenticationMethod'>4</Data>
    <Data Name='RemoteMachineAccount'>clientname.fqdn</Data>
    <Data Name='UserAuthenticationMethod'>2</Data>
    <Data Name='RemoteUserAcount'>domain\user</Data>
    <Data Name='RemoteIPAddress'>ipv6addr</Data>
    <Data Name='LocalIPAddress'>ipv6addr</Data>
    <Data Name='TechnologyProviderKey'>{1BEBC969-61A5-4732-A177-847A0817862A}</Data>
    <Data Name='IPsecTrafficMode'>1</Data>
    <Data Name='BytesTransferredInbound'>34256</Data>
    <Data Name='BytesTransferredOutbound'>30672</Data>
    <Data Name='BytesTransferredTotal'>64928</Data>
    <Data Name='StartTime'>2020-03-24T02:33:00.492Z</Data>
    <Data Name='CloseTime'>2020-03-24T02:53:43.017Z</Data>
  </EventData>
</Event>

I have this in my props.conf

[directaccess:connections]
NO_BINARY_CHECK = 1
TIME_FORMAT = %a %b %d %H:%M:%S %T %Y
pulldown_type = 1
REPORT-xmlkv = xmlkv-alternative

and this in my transforms.conf

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

(found on another splunk answers post)

I'm really not sure how it works, but that is enough to exract the first section so that I end up with a Computer, Channel, Data, EventID, EventRecordID, Level, Opcode and Task field. Data just seems to contain the first of the "Data Name" fields.

The props.conf and transforms.conf seemed good enough to extract the top part contained inside "System", but not "EventData". For the botom "EventData" part, I tried with manual field extractions, first letting splunk pick one for me then trying to create the rest. I ended up with something like this:

^(?:[^=\n]*=){12}'\w+'>(?P[^<]+)

^(?:[^=\n]*=){15}'\w+'>(?P[^<]+)

For the fields, but using the count of characters (? I think that's what its doing) didn't always work because some fields were the same lenth and were giving me weird results.

At this point i"m ok with manually typing the field names, but I don't know how to build a proper query to extract the bottom part inside the "EventData" section. I was trying to do something like this (but this obviously didn't work):

^(?:[^=\n]*=)ConnectionID'\w+'>(?P[^<]+)

Unfortunately regex is my Achilles heel, so I appreciate any help I can get with this.

0 Karma

darrenfuller
Contributor

Hi jbandautrgv,

The easiest way to get the fields extracted on an xml is to use KV_MODE = xml in your props.conf

If you are determined to use props / transforms... i believe this works:

props:

[directaccess:connections]
 NO_BINARY_CHECK = 1
 TIME_FORMAT = %a %b %d %H:%M:%S %T %Y
 pulldown_type = 1
 REPORT-xmlkv = xmlkv-alternative
 REPORT-xmlkv2 = xmlkv-alternative2

transforms:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

[xmlkv-alternative2]
REGEX = <Data\sName='([^']+)'>([^<]+)<\/Data>
FORMAT = $1::$2

This adds a second extraction that matches the Data Name bits

Hope this helps

0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...