Dashboards & Visualizations

Parsing XML into fields is not working properly

aaronzabell
Path Finder

Splunk isn't completely parsing the xml into fields in search results, only sections. For example, in the sample event below, the system and userdata sections are fields but the xml headers inside them are not parsed into fields (i.e. Username and IpAddress.)
Based on some of what I've read here in the forums, I've already edited my props.conf for sourcetype=XmlWinEventLog but haven't seen any change.

[source::XmlWinEventLog]
KV_MODE=xml
TRUNCATE = 0

I don't know what I'm missing and could use some help. (Hell, what I put in there, Splunk was probably already doing)

Here's a sample event (I added line breaks to make it easier to read. Raw data in search results it's a single line):

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event" xml:lang="en-US">
<System>
  <Provider Name="Microsoft-Windows-TerminalServices-Gateway" Guid="{4D5AE6A1-C7C8-4E6D-B840-4D8080B42E1B}" /> 
  <EventID>200</EventID> 
  <Version>0</Version> 
  <Level>4</Level> 
  <Task>2</Task> 
  <Opcode>30</Opcode> 
  <Keywords>0x4020000001000000</Keywords> 
  <TimeCreated SystemTime="2020-02-21T18:54:19.913701800Z" /> 
  <EventRecordID>1219</EventRecordID> 
  <Correlation ActivityID="{BEA11342-474B-47DE-907D-F2FBEBD40000}" /> 
  <Execution ProcessID="5480" ThreadID="8416" /> 
  <Channel>Microsoft-Windows-TerminalServices-Gateway/Operational</Channel> 
  <Computer>gatewayserver.domain.com</Computer> 
  <Security UserID="S-1-5-20" /> 
  </System>
<UserData>
<EventInfo xmlns="aag">
  <Username>domain\username</Username> 
  <IpAddress>173.x.x.x</IpAddress> 
  <AuthType>NTLM</AuthType> 
  <Resource /> 
  <ConnectionProtocol>HTTP</ConnectionProtocol> 
  <ErrorCode>0</ErrorCode> 
  </EventInfo>
  </UserData>
<RenderingInfo Culture="en-US">
  <Message>The user "domain\username", on client computer "173.x.x.x", met connection authorization policy requirements and was therefore authorized to access the RD Gateway server. The authentication method used was: "NTLM" and connection protocol used: "HTTP".</Message> 
  <Level>Information</Level> 
  <Task /> 
  <Opcode /> 
  <Channel /> 
  <Provider /> 
<Keywords>
  <Keyword>Audit Success</Keyword> 
  </Keywords>
  </RenderingInfo>
  </Event>
0 Karma
1 Solution

aaronzabell
Path Finder

I ended up doing a custom field extraction for the fields I wanted. I had to write my own regex since the auto regex wasn't cooperating.

For username:

^(?:.*)<Username>(?P<username>[^<]+)

For source IP:

^(?:.*)<IpAddress>(?P<src_ip>[^<]+)

For the workstation that the user connects to:

^(?:.*)<Resource>(?P<workstation>[^<]+)

View solution in original post

0 Karma

aaronzabell
Path Finder

I ended up doing a custom field extraction for the fields I wanted. I had to write my own regex since the auto regex wasn't cooperating.

For username:

^(?:.*)<Username>(?P<username>[^<]+)

For source IP:

^(?:.*)<IpAddress>(?P<src_ip>[^<]+)

For the workstation that the user connects to:

^(?:.*)<Resource>(?P<workstation>[^<]+)
0 Karma

to4kawa
Ultra Champion

spath is good.

I try to extract by props.conf and transforms.conf
props.conf:

[XML_sample]
NO_BINARY_CHECK = 1
REPORT-xml_first = xml_first
REPORT-xml_second = xml_second
SHOULD_LINEMERGE = 0
TIME_FORMAT = %FT%T.%9QZ
TIME_PREFIX = SystemTime=\"
TZ = UTC
pulldown_type = 1

transforms.conf:

[xml_first]
CLEAN_KEYS = 0
REGEX = (?:\<)?([\w \:]+?)=\"(\S+)\"
FORMAT = $1::$2

[xml_second]
CLEAN_KEYS = 0
REGEX = \<(\w+)\>([^\<]+)<\/\1\>
FORMAT = $1::$2

I was able to extract it.
It's not easy.

0 Karma

aaronzabell
Path Finder

Should my props.conf and transforms.conf look exactly as you have them? And is REPORT-xms_second a typo? I tried it as written and as REPORT-xml_second but it didn't make a difference either way.

0 Karma

to4kawa
Ultra Champion

I see , second is typo.
my answer is updated.
In my splunk, this works.

0 Karma

aaronzabell
Path Finder

Tried it but no change. Still not parsing everything in the XML data.

0 Karma

to4kawa
Ultra Champion

my splunk works, I don't know missing fields and restart,
good luck

0 Karma

niketn
Legend

@aaronzabell I tried to feed the sample data you provided to a run anywhere search and spath extracted the fields including User Name and IP address correctly. So not sure what is going wrong with your config. Have you checked the field name for username as Event.UserData.EventInfo.Username and IP Address as Event.UserData.EventInfo.IpAddress?

You can run the following run anywhere example to check the fields yourself.

| makeresults
| eval _raw=" <Event xmlns=\"http://schemas.microsoft.com/win/2004/08/events/event\" xml:lang=\"en-US\">
 <System>
   <Provider Name=\"Microsoft-Windows-TerminalServices-Gateway\" Guid=\"{4D5AE6A1-C7C8-4E6D-B840-4D8080B42E1B}\" /> 
   <EventID>200</EventID> 
   <Version>0</Version> 
   <Level>4</Level> 
   <Task>2</Task> 
   <Opcode>30</Opcode> 
   <Keywords>0x4020000001000000</Keywords> 
   <TimeCreated SystemTime=\"2020-02-21T18:54:19.913701800Z\" /> 
   <EventRecordID>1219</EventRecordID> 
   <Correlation ActivityID=\"{BEA11342-474B-47DE-907D-F2FBEBD40000}\" /> 
   <Execution ProcessID=\"5480\" ThreadID=\"8416\" /> 
   <Channel>Microsoft-Windows-TerminalServices-Gateway/Operational</Channel> 
   <Computer>gatewayserver.domain.com</Computer> 
   <Security UserID=\"S-1-5-20\" /> 
   </System>
 <UserData>
 <EventInfo xmlns=\"aag\">
   <Username>domain\username</Username> 
   <IpAddress>173.x.x.x</IpAddress> 
   <AuthType>NTLM</AuthType> 
   <Resource /> 
   <ConnectionProtocol>HTTP</ConnectionProtocol> 
   <ErrorCode>0</ErrorCode> 
   </EventInfo>
   </UserData>
 <RenderingInfo Culture=\"en-US\">
   <Message>The user \"domain\username\", on client computer \"173.x.x.x\", met connection authorization policy requirements and was therefore authorized to access the RD Gateway server. The authentication method used was: \"NTLM\" and connection protocol used: \"HTTP\".</Message> 
   <Level>Information</Level> 
   <Task /> 
   <Opcode /> 
   <Channel /> 
   <Provider /> 
 <Keywords>
   <Keyword>Audit Success</Keyword> 
   </Keywords>
   </RenderingInfo>
   </Event>"
| spath
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

aaronzabell
Path Finder

Looking more at the live data. The following section of the XML gets parsed into a single feild called UserData_Xml Still no idea how to have it parse deeper.

<EventInfo xmlns="aag">
   <Username>domain\username</Username> 
   <IpAddress>173.x.x.x</IpAddress> 
   <AuthType>NTLM</AuthType> 
   <Resource /> 
   <ConnectionProtocol>HTTP</ConnectionProtocol> 
   <ErrorCode>0</ErrorCode> 
   </EventInfo>
0 Karma

aaronzabell
Path Finder

Looking at the live data again. <message> gets parsed.

0 Karma

aaronzabell
Path Finder

Tried it as shown and it worked. However, if I collapse the XML into a single line of text (like it is as it gets ingested), it breaks. Played with it a bit and it looks like the <Message> section is what breaks it because the makeresults parses fine when I remove it.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...