Good morning. I'm trying to use rex to extract a username from a MS Windows Application Event Log. The event shows a field called "EventData_Xml" and in there is the following (NOTE: I replaced greater than signs with brackets as it was treating it as HTML and not displaying properly):
[Data]kjewgjkewkj[/Data][Data] Reason: Could not find a login matching the name provided.[/Data][Data] [CLIENT: <local machine>][/Data][Binary]blah, blah, blah[/binary]"
The username is showing in between the first tags (in this case kjewgjkewkj). I put in a fake username and tried connecting to create a failed login event for MSSQL. So I was trying to use rex to grab the text between the first two data tags but I can't get it to work. Using the field extractor in Splunk seems overly complicated when looking at the search code it produces as it is using the _raw field. Is there a simpler way to do it using something like the following:
host=blah* source="WinEventLog:Application"|xmlkv|search EventID=18456|rex field=EventData_Xml "(?)"
Thanks for any help.
@SplunkLunk, you should post code with code
button (one with 101010
) so that it does not escape.
If you are ingesting Windows Event logs in XML format, you can enable KV_MODE=xml
from props.conf
. Refer to documentation (https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf#Field_extraction_configuration). This will generate search time field extraction from XML data.
You can also use spath
command, to parse XML data. Since your _raw data is XML you can use | spath
which should give you fields exctracted automatically. If you know complete structure of XML DOM, you can use path
in spath command to extract only the field you need. https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath
Following is a run anywhere search using spath with the sample data provided, however, the structure <CLIENT: <local machine>>
, does not seem correct for XML:
| makeresults
| eval _raw="<Data>kjewgjkewkj</Data><Data> Reason: Could not find a login matching the name provided.</Data><Data><CLIENT: <local machine>></Data><Binary>blah, blah, blah</binary>"
| spath
| eval UserName=mvindex(Data,0)
If you want to stick to rex command you can use the following:
| makeresults
| eval _raw="<Data>kjewgjkewkj</Data><Data> Reason: Could not find a login matching the name provided.</Data><Data><CLIENT: <local machine>></Data><Binary>blah, blah, blah</binary>"
| rex "\<Data\>(?<UserName>[^\<]+)\<\/Data\>"
PS: By default max_match
parameter of rex command is set to 1
which means it will only extract first occurrence of <Data>
.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Rex
Since the exact answer depends on the structure of your data, please re-post sample data using code button (101010).
@SplunkLunk, you should post code with code
button (one with 101010
) so that it does not escape.
If you are ingesting Windows Event logs in XML format, you can enable KV_MODE=xml
from props.conf
. Refer to documentation (https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf#Field_extraction_configuration). This will generate search time field extraction from XML data.
You can also use spath
command, to parse XML data. Since your _raw data is XML you can use | spath
which should give you fields exctracted automatically. If you know complete structure of XML DOM, you can use path
in spath command to extract only the field you need. https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath
Following is a run anywhere search using spath with the sample data provided, however, the structure <CLIENT: <local machine>>
, does not seem correct for XML:
| makeresults
| eval _raw="<Data>kjewgjkewkj</Data><Data> Reason: Could not find a login matching the name provided.</Data><Data><CLIENT: <local machine>></Data><Binary>blah, blah, blah</binary>"
| spath
| eval UserName=mvindex(Data,0)
If you want to stick to rex command you can use the following:
| makeresults
| eval _raw="<Data>kjewgjkewkj</Data><Data> Reason: Could not find a login matching the name provided.</Data><Data><CLIENT: <local machine>></Data><Binary>blah, blah, blah</binary>"
| rex "\<Data\>(?<UserName>[^\<]+)\<\/Data\>"
PS: By default max_match
parameter of rex command is set to 1
which means it will only extract first occurrence of <Data>
.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Rex
Since the exact answer depends on the structure of your data, please re-post sample data using code button (101010).
Thanks. I used the main part of your rex statement to make it work. My search now looks like:
host=blah* source="WinEventLog:Application"|xmlkv|search EventID=18456 OR EventID=18453 |rex field=EventData_Xml "[Data](?[User][^<]+)[\/Data]"
This seems to get me the results I want (greater than signs switched for brackets).
@SplunkLunk, I have converted my comment to Answer. Please accept.
Hi SplunkLunk,
Di you tried this rex
host=blah* source="WinEventLog:Application"
|xmlkv
|search EventID=18456
|rex field=EventData_Xml "\[Data\](?<UserName>[^\[]*)\[\/Data\]"
You can test it at https://regex101.com/r/EdaRpI/1
Bye.
Giuseppe
Thanks. The above poster had a subtle difference in the rex expression which stopped it at the first occurrence of the html tag. That got me the results I wanted.
If you're satisfied, please, accept or upvote it.
Bye.
Giuseppe
Thanks. That's closer but it's adding the text from the next html data tag. So now the username looks like:
"kjewgjkewkj[/Data][Data] Reason: Could not find a login matching the name provided."
Again, I've traded the greater than signs for brackets.
add match=1 to the rex command
host=blah* source="WinEventLog:Application"
|xmlkv
|search EventID=18456
|rex field=EventData_Xml "\[Data\](?<UserName>[^\[]*)\[\/Data\]" match=1
Bye.
Giuseppe
adding max_match=1 still didn't seem to work. The results for UserName still show:
kjewgjkewkj[/Data][Data] Reason: Could not find a login matching the name provided.