Splunk Search

help with regex for windows event log

jbandautrgv
Engager

I'm trying to parse out data from an event log in xml format. I'm posting an example of two logs that are coming from the same eveng log (same sourcetype):

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <System><Provider Name='Microsoft-Windows-Base-Filtering-Engine-Connections' Guid='{121D3DA8-BAF1-4DCB-929F-2D4C9A47F7AB}'/>
    <EventID>2000</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2020-03-23T14:23:44.982049900Z'/>
    <EventRecordID>1238530</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1252' ThreadID='11720'/>
    <Channel>microsoft-windows-base-filtering-engine-connections/operational</Channel>
    <Computer>servername.fqdn</Computer>
    <Security UserID='S-1-5-19'/>
  </System>
  <EventData>
    <Data Name='ConnectionId'>13228601961099160992</Data>
    <Data Name='MachineAuthenticationMethod'>4</Data>
    <Data Name='RemoteMachineAccount'>machine.fqdn</Data>
    <Data Name='UserAuthenticationMethod'>2</Data>
    <Data Name='RemoteUserAcount'>domain\user</Data>
    <Data Name='RemoteIPAddress'>ipv6address</Data>
    <Data Name='LocalIPAddress'>ipv6address</Data>
    <Data Name='TechnologyProviderKey'>{1BEBC969-61A5-4732-A177-847A0817862A}</Data>
    <Data Name='IPsecTrafficMode'>1</Data>
    <Data Name='DHGroup'>0</Data>
    <Data Name='StartTime'>2020-03-23T14:23:44.969Z</Data>
  </EventData>
</Event>

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <System>
    <Provider Name='Microsoft-Windows-Base-Filtering-Engine-Connections' Guid='{121D3DA8-BAF1-4DCB-929F-2D4C9A47F7AB}'/>
    <EventID>2001</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2020-03-24T02:53:43.017501900Z'/>
    <EventRecordID>1284675</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1252' ThreadID='7796'/>
    <Channel>microsoft-windows-base-filtering-engine-connections/operational</Channel>
    <Computer>servername.fqdn</Computer>
    <Security UserID='S-1-5-19'/>
  </System>
  <EventData>
    <Data Name='ConnectionId'>13228601961099183464</Data>
    <Data Name='MachineAuthenticationMethod'>4</Data>
    <Data Name='RemoteMachineAccount'>clientname.fqdn</Data>
    <Data Name='UserAuthenticationMethod'>2</Data>
    <Data Name='RemoteUserAcount'>domain\user</Data>
    <Data Name='RemoteIPAddress'>ipv6addr</Data>
    <Data Name='LocalIPAddress'>ipv6addr</Data>
    <Data Name='TechnologyProviderKey'>{1BEBC969-61A5-4732-A177-847A0817862A}</Data>
    <Data Name='IPsecTrafficMode'>1</Data>
    <Data Name='BytesTransferredInbound'>34256</Data>
    <Data Name='BytesTransferredOutbound'>30672</Data>
    <Data Name='BytesTransferredTotal'>64928</Data>
    <Data Name='StartTime'>2020-03-24T02:33:00.492Z</Data>
    <Data Name='CloseTime'>2020-03-24T02:53:43.017Z</Data>
  </EventData>
</Event>

I have this in my props.conf

[directaccess:connections]
NO_BINARY_CHECK = 1
TIME_FORMAT = %a %b %d %H:%M:%S %T %Y
pulldown_type = 1
REPORT-xmlkv = xmlkv-alternative

and this in my transforms.conf

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

(found on another splunk answers post)

I'm really not sure how it works, but that is enough to exract the first section so that I end up with a Computer, Channel, Data, EventID, EventRecordID, Level, Opcode and Task field. Data just seems to contain the first of the "Data Name" fields.

The props.conf and transforms.conf seemed good enough to extract the top part contained inside "System", but not "EventData". For the botom "EventData" part, I tried with manual field extractions, first letting splunk pick one for me then trying to create the rest. I ended up with something like this:

^(?:[^=\n]*=){12}'\w+'>(?P[^<]+)

^(?:[^=\n]*=){15}'\w+'>(?P[^<]+)

For the fields, but using the count of characters (? I think that's what its doing) didn't always work because some fields were the same lenth and were giving me weird results.

At this point i"m ok with manually typing the field names, but I don't know how to build a proper query to extract the bottom part inside the "EventData" section. I was trying to do something like this (but this obviously didn't work):

^(?:[^=\n]*=)ConnectionID'\w+'>(?P[^<]+)

Unfortunately regex is my Achilles heel, so I appreciate any help I can get with this.

0 Karma

darrenfuller
Contributor

Hi jbandautrgv,

The easiest way to get the fields extracted on an xml is to use KV_MODE = xml in your props.conf

If you are determined to use props / transforms... i believe this works:

props:

[directaccess:connections]
 NO_BINARY_CHECK = 1
 TIME_FORMAT = %a %b %d %H:%M:%S %T %Y
 pulldown_type = 1
 REPORT-xmlkv = xmlkv-alternative
 REPORT-xmlkv2 = xmlkv-alternative2

transforms:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

[xmlkv-alternative2]
REGEX = <Data\sName='([^']+)'>([^<]+)<\/Data>
FORMAT = $1::$2

This adds a second extraction that matches the Data Name bits

Hope this helps

0 Karma
Get Updates on the Splunk Community!

Platform Highlights | November 2022 Newsletter

 November 2022 Skill Up on Splunk with our New Builder Tech Talk SeriesCan you build it? Yes you can! *play ...

Splunk Education - Fast Start Program!

Welcome to Splunk Education! Splunk training programs are designed to enable you to get started quickly and ...

Five Subtly Different Ways of Adding Manual Instrumentation in Java

You can find the code of this example on GitHub here. Please feel free to star the repository to keep in ...