Hi, I have an XML-like (but not proper XML) feed that I need to parse.
A sample is below, and I need to parse out each field.
Each field will not necessarily be in each event, so I need a method that will find it, without depending upon a previous field or the location within the event itself.
Can anyone help?
Apr 22 19:54:29 184.108.40.206 <STONEGATE_LOG><TIMESTAMP>2019-04-22 15:54:28</TIMESTAMP><LOGID>9999999</LOGID><NODEID>220.127.116.11</NODEID><FACILITY>Packet Filtering</FACILITY><TYPE>Notification</TYPE><EVENT>New connection</EVENT><ACTION>Allow</ACTION><SRC>18.104.22.168</SRC><DST>X.X.X.X</DST><SERVICE>HTTP</SERVICE><PROTOCOL>2</PROTOCOL><SPORT>12345</SPORT><DPORT>99</DPORT><RULEID>60732.1</RULEID><SRCIF>5</SRCIF><COMPID>some text here</COMPID><RECEPTIONTIME>2019-04-22 15:54:29</RECEPTIONTIME><SENDERTYPE>Firewall</SENDERTYPE><SITUATION>Connection_Allowed</SITUATION><EVENTID>99999999999</EVENTID></STONEGATE_LOG>
To extract XML data at search time, you can use below config on Search Head.
[yourSourcetype] REPORT-test = xmlkv_alt
[xmlkv_alt] FORMAT = $1::$2 REGEX = <([^>]*)>([^<]*)<\/\1>
EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1
Thanks. I see them appearing on the regex site, but they don't appear as fields on the SH when I try that - are there additional steps requried?
Interesting, so the xml doesn't have to be well-formed, as the sample above isn't well-formed.
Amazing, because back-then, a similar solution for json was a big hit here - How can we extract a json document within an event?
We ended up with -
REPORT-extract = json_embedded [json_embedded] REGEX = "(\w+)"."(\S+?)" FORMAT = $1::$2
All these answers are missing this setting in transforms.conf:
MV_ADD = true
So the full stanza is:
[YourNameHere] REGEX = <([^\/][^>]+)>(.*?)<\/[^>]+> FORMAT = $1::$2 MV_ADD = true
This will not work because
REPEAT_MATCH is only valid for Indexed-time field extraction and solution which I have provided is for search time extraction.