Splunk isn't completely parsing the xml into fields in search results, only sections. For example, in the sample event below, the system
and userdata
sections are fields but the xml headers inside them are not parsed into fields (i.e. Username
and IpAddress
.)
Based on some of what I've read here in the forums, I've already edited my props.conf
for sourcetype=XmlWinEventLog but haven't seen any change.
[source::XmlWinEventLog]
KV_MODE=xml
TRUNCATE = 0
I don't know what I'm missing and could use some help. (Hell, what I put in there, Splunk was probably already doing)
Here's a sample event (I added line breaks to make it easier to read. Raw data in search results it's a single line):
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event" xml:lang="en-US">
<System>
<Provider Name="Microsoft-Windows-TerminalServices-Gateway" Guid="{4D5AE6A1-C7C8-4E6D-B840-4D8080B42E1B}" />
<EventID>200</EventID>
<Version>0</Version>
<Level>4</Level>
<Task>2</Task>
<Opcode>30</Opcode>
<Keywords>0x4020000001000000</Keywords>
<TimeCreated SystemTime="2020-02-21T18:54:19.913701800Z" />
<EventRecordID>1219</EventRecordID>
<Correlation ActivityID="{BEA11342-474B-47DE-907D-F2FBEBD40000}" />
<Execution ProcessID="5480" ThreadID="8416" />
<Channel>Microsoft-Windows-TerminalServices-Gateway/Operational</Channel>
<Computer>gatewayserver.domain.com</Computer>
<Security UserID="S-1-5-20" />
</System>
<UserData>
<EventInfo xmlns="aag">
<Username>domain\username</Username>
<IpAddress>173.x.x.x</IpAddress>
<AuthType>NTLM</AuthType>
<Resource />
<ConnectionProtocol>HTTP</ConnectionProtocol>
<ErrorCode>0</ErrorCode>
</EventInfo>
</UserData>
<RenderingInfo Culture="en-US">
<Message>The user "domain\username", on client computer "173.x.x.x", met connection authorization policy requirements and was therefore authorized to access the RD Gateway server. The authentication method used was: "NTLM" and connection protocol used: "HTTP".</Message>
<Level>Information</Level>
<Task />
<Opcode />
<Channel />
<Provider />
<Keywords>
<Keyword>Audit Success</Keyword>
</Keywords>
</RenderingInfo>
</Event>
I ended up doing a custom field extraction for the fields I wanted. I had to write my own regex since the auto regex wasn't cooperating.
For username:
^(?:.*)<Username>(?P<username>[^<]+)
For source IP:
^(?:.*)<IpAddress>(?P<src_ip>[^<]+)
For the workstation that the user connects to:
^(?:.*)<Resource>(?P<workstation>[^<]+)
I ended up doing a custom field extraction for the fields I wanted. I had to write my own regex since the auto regex wasn't cooperating.
For username:
^(?:.*)<Username>(?P<username>[^<]+)
For source IP:
^(?:.*)<IpAddress>(?P<src_ip>[^<]+)
For the workstation that the user connects to:
^(?:.*)<Resource>(?P<workstation>[^<]+)
spath
is good.
I try to extract by props.conf and transforms.conf
props.conf:
[XML_sample]
NO_BINARY_CHECK = 1
REPORT-xml_first = xml_first
REPORT-xml_second = xml_second
SHOULD_LINEMERGE = 0
TIME_FORMAT = %FT%T.%9QZ
TIME_PREFIX = SystemTime=\"
TZ = UTC
pulldown_type = 1
transforms.conf:
[xml_first]
CLEAN_KEYS = 0
REGEX = (?:\<)?([\w \:]+?)=\"(\S+)\"
FORMAT = $1::$2
[xml_second]
CLEAN_KEYS = 0
REGEX = \<(\w+)\>([^\<]+)<\/\1\>
FORMAT = $1::$2
I was able to extract it.
It's not easy.
Should my props.conf
and transforms.conf
look exactly as you have them? And is REPORT-xms_second
a typo? I tried it as written and as REPORT-xml_second
but it didn't make a difference either way.
I see , second is typo.
my answer is updated.
In my splunk, this works.
Tried it but no change. Still not parsing everything in the XML data.
my splunk works, I don't know missing fields and restart,
good luck
@aaronzabell I tried to feed the sample data you provided to a run anywhere search and spath extracted the fields including User Name and IP address correctly. So not sure what is going wrong with your config. Have you checked the field name for username as Event.UserData.EventInfo.Username
and IP Address as Event.UserData.EventInfo.IpAddress
?
You can run the following run anywhere example to check the fields yourself.
| makeresults
| eval _raw=" <Event xmlns=\"http://schemas.microsoft.com/win/2004/08/events/event\" xml:lang=\"en-US\">
<System>
<Provider Name=\"Microsoft-Windows-TerminalServices-Gateway\" Guid=\"{4D5AE6A1-C7C8-4E6D-B840-4D8080B42E1B}\" />
<EventID>200</EventID>
<Version>0</Version>
<Level>4</Level>
<Task>2</Task>
<Opcode>30</Opcode>
<Keywords>0x4020000001000000</Keywords>
<TimeCreated SystemTime=\"2020-02-21T18:54:19.913701800Z\" />
<EventRecordID>1219</EventRecordID>
<Correlation ActivityID=\"{BEA11342-474B-47DE-907D-F2FBEBD40000}\" />
<Execution ProcessID=\"5480\" ThreadID=\"8416\" />
<Channel>Microsoft-Windows-TerminalServices-Gateway/Operational</Channel>
<Computer>gatewayserver.domain.com</Computer>
<Security UserID=\"S-1-5-20\" />
</System>
<UserData>
<EventInfo xmlns=\"aag\">
<Username>domain\username</Username>
<IpAddress>173.x.x.x</IpAddress>
<AuthType>NTLM</AuthType>
<Resource />
<ConnectionProtocol>HTTP</ConnectionProtocol>
<ErrorCode>0</ErrorCode>
</EventInfo>
</UserData>
<RenderingInfo Culture=\"en-US\">
<Message>The user \"domain\username\", on client computer \"173.x.x.x\", met connection authorization policy requirements and was therefore authorized to access the RD Gateway server. The authentication method used was: \"NTLM\" and connection protocol used: \"HTTP\".</Message>
<Level>Information</Level>
<Task />
<Opcode />
<Channel />
<Provider />
<Keywords>
<Keyword>Audit Success</Keyword>
</Keywords>
</RenderingInfo>
</Event>"
| spath
Looking more at the live data. The following section of the XML gets parsed into a single feild called UserData_Xml
Still no idea how to have it parse deeper.
<EventInfo xmlns="aag">
<Username>domain\username</Username>
<IpAddress>173.x.x.x</IpAddress>
<AuthType>NTLM</AuthType>
<Resource />
<ConnectionProtocol>HTTP</ConnectionProtocol>
<ErrorCode>0</ErrorCode>
</EventInfo>
Looking at the live data again. <message>
gets parsed.
Tried it as shown and it worked. However, if I collapse the XML into a single line of text (like it is as it gets ingested), it breaks. Played with it a bit and it looks like the <Message>
section is what breaks it because the makeresults
parses fine when I remove it.