Dashboards & Visualizations

Parsing XML into fields is not working properly

aaronzabell
Path Finder

Splunk isn't completely parsing the xml into fields in search results, only sections. For example, in the sample event below, the system and userdata sections are fields but the xml headers inside them are not parsed into fields (i.e. Username and IpAddress.)
Based on some of what I've read here in the forums, I've already edited my props.conf for sourcetype=XmlWinEventLog but haven't seen any change.

[source::XmlWinEventLog]
KV_MODE=xml
TRUNCATE = 0

I don't know what I'm missing and could use some help. (Hell, what I put in there, Splunk was probably already doing)

Here's a sample event (I added line breaks to make it easier to read. Raw data in search results it's a single line):

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event" xml:lang="en-US">
<System>
  <Provider Name="Microsoft-Windows-TerminalServices-Gateway" Guid="{4D5AE6A1-C7C8-4E6D-B840-4D8080B42E1B}" /> 
  <EventID>200</EventID> 
  <Version>0</Version> 
  <Level>4</Level> 
  <Task>2</Task> 
  <Opcode>30</Opcode> 
  <Keywords>0x4020000001000000</Keywords> 
  <TimeCreated SystemTime="2020-02-21T18:54:19.913701800Z" /> 
  <EventRecordID>1219</EventRecordID> 
  <Correlation ActivityID="{BEA11342-474B-47DE-907D-F2FBEBD40000}" /> 
  <Execution ProcessID="5480" ThreadID="8416" /> 
  <Channel>Microsoft-Windows-TerminalServices-Gateway/Operational</Channel> 
  <Computer>gatewayserver.domain.com</Computer> 
  <Security UserID="S-1-5-20" /> 
  </System>
<UserData>
<EventInfo xmlns="aag">
  <Username>domain\username</Username> 
  <IpAddress>173.x.x.x</IpAddress> 
  <AuthType>NTLM</AuthType> 
  <Resource /> 
  <ConnectionProtocol>HTTP</ConnectionProtocol> 
  <ErrorCode>0</ErrorCode> 
  </EventInfo>
  </UserData>
<RenderingInfo Culture="en-US">
  <Message>The user "domain\username", on client computer "173.x.x.x", met connection authorization policy requirements and was therefore authorized to access the RD Gateway server. The authentication method used was: "NTLM" and connection protocol used: "HTTP".</Message> 
  <Level>Information</Level> 
  <Task /> 
  <Opcode /> 
  <Channel /> 
  <Provider /> 
<Keywords>
  <Keyword>Audit Success</Keyword> 
  </Keywords>
  </RenderingInfo>
  </Event>
0 Karma
1 Solution

aaronzabell
Path Finder

I ended up doing a custom field extraction for the fields I wanted. I had to write my own regex since the auto regex wasn't cooperating.

For username:

^(?:.*)<Username>(?P<username>[^<]+)

For source IP:

^(?:.*)<IpAddress>(?P<src_ip>[^<]+)

For the workstation that the user connects to:

^(?:.*)<Resource>(?P<workstation>[^<]+)

View solution in original post

0 Karma

aaronzabell
Path Finder

I ended up doing a custom field extraction for the fields I wanted. I had to write my own regex since the auto regex wasn't cooperating.

For username:

^(?:.*)<Username>(?P<username>[^<]+)

For source IP:

^(?:.*)<IpAddress>(?P<src_ip>[^<]+)

For the workstation that the user connects to:

^(?:.*)<Resource>(?P<workstation>[^<]+)
0 Karma

to4kawa
Ultra Champion

spath is good.

I try to extract by props.conf and transforms.conf
props.conf:

[XML_sample]
NO_BINARY_CHECK = 1
REPORT-xml_first = xml_first
REPORT-xml_second = xml_second
SHOULD_LINEMERGE = 0
TIME_FORMAT = %FT%T.%9QZ
TIME_PREFIX = SystemTime=\"
TZ = UTC
pulldown_type = 1

transforms.conf:

[xml_first]
CLEAN_KEYS = 0
REGEX = (?:\<)?([\w \:]+?)=\"(\S+)\"
FORMAT = $1::$2

[xml_second]
CLEAN_KEYS = 0
REGEX = \<(\w+)\>([^\<]+)<\/\1\>
FORMAT = $1::$2

I was able to extract it.
It's not easy.

0 Karma

aaronzabell
Path Finder

Should my props.conf and transforms.conf look exactly as you have them? And is REPORT-xms_second a typo? I tried it as written and as REPORT-xml_second but it didn't make a difference either way.

0 Karma

to4kawa
Ultra Champion

I see , second is typo.
my answer is updated.
In my splunk, this works.

0 Karma

aaronzabell
Path Finder

Tried it but no change. Still not parsing everything in the XML data.

0 Karma

to4kawa
Ultra Champion

my splunk works, I don't know missing fields and restart,
good luck

0 Karma

niketn
Legend

@aaronzabell I tried to feed the sample data you provided to a run anywhere search and spath extracted the fields including User Name and IP address correctly. So not sure what is going wrong with your config. Have you checked the field name for username as Event.UserData.EventInfo.Username and IP Address as Event.UserData.EventInfo.IpAddress?

You can run the following run anywhere example to check the fields yourself.

| makeresults
| eval _raw=" <Event xmlns=\"http://schemas.microsoft.com/win/2004/08/events/event\" xml:lang=\"en-US\">
 <System>
   <Provider Name=\"Microsoft-Windows-TerminalServices-Gateway\" Guid=\"{4D5AE6A1-C7C8-4E6D-B840-4D8080B42E1B}\" /> 
   <EventID>200</EventID> 
   <Version>0</Version> 
   <Level>4</Level> 
   <Task>2</Task> 
   <Opcode>30</Opcode> 
   <Keywords>0x4020000001000000</Keywords> 
   <TimeCreated SystemTime=\"2020-02-21T18:54:19.913701800Z\" /> 
   <EventRecordID>1219</EventRecordID> 
   <Correlation ActivityID=\"{BEA11342-474B-47DE-907D-F2FBEBD40000}\" /> 
   <Execution ProcessID=\"5480\" ThreadID=\"8416\" /> 
   <Channel>Microsoft-Windows-TerminalServices-Gateway/Operational</Channel> 
   <Computer>gatewayserver.domain.com</Computer> 
   <Security UserID=\"S-1-5-20\" /> 
   </System>
 <UserData>
 <EventInfo xmlns=\"aag\">
   <Username>domain\username</Username> 
   <IpAddress>173.x.x.x</IpAddress> 
   <AuthType>NTLM</AuthType> 
   <Resource /> 
   <ConnectionProtocol>HTTP</ConnectionProtocol> 
   <ErrorCode>0</ErrorCode> 
   </EventInfo>
   </UserData>
 <RenderingInfo Culture=\"en-US\">
   <Message>The user \"domain\username\", on client computer \"173.x.x.x\", met connection authorization policy requirements and was therefore authorized to access the RD Gateway server. The authentication method used was: \"NTLM\" and connection protocol used: \"HTTP\".</Message> 
   <Level>Information</Level> 
   <Task /> 
   <Opcode /> 
   <Channel /> 
   <Provider /> 
 <Keywords>
   <Keyword>Audit Success</Keyword> 
   </Keywords>
   </RenderingInfo>
   </Event>"
| spath
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

aaronzabell
Path Finder

Looking more at the live data. The following section of the XML gets parsed into a single feild called UserData_Xml Still no idea how to have it parse deeper.

<EventInfo xmlns="aag">
   <Username>domain\username</Username> 
   <IpAddress>173.x.x.x</IpAddress> 
   <AuthType>NTLM</AuthType> 
   <Resource /> 
   <ConnectionProtocol>HTTP</ConnectionProtocol> 
   <ErrorCode>0</ErrorCode> 
   </EventInfo>
0 Karma

aaronzabell
Path Finder

Looking at the live data again. <message> gets parsed.

0 Karma

aaronzabell
Path Finder

Tried it as shown and it worked. However, if I collapse the XML into a single line of text (like it is as it gets ingested), it breaks. Played with it a bit and it looks like the <Message> section is what breaks it because the makeresults parses fine when I remove it.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...