Getting Data In
Highlighted

How to parse out fields

Champion

Hi, I have an XML-like (but not proper XML) feed that I need to parse.

A sample is below, and I need to parse out each field.

Each field will not necessarily be in each event, so I need a method that will find it, without depending upon a previous field or the location within the event itself.

Can anyone help?

Apr 22 19:54:29 138.126.78.80 <STONEGATE_LOG><TIMESTAMP>2019-04-22 15:54:28</TIMESTAMP><LOGID>9999999</LOGID><NODEID>1.2.3.4</NODEID><FACILITY>Packet Filtering</FACILITY><TYPE>Notification</TYPE><EVENT>New connection</EVENT><ACTION>Allow</ACTION><SRC>4.5.6.7</SRC><DST>X.X.X.X</DST><SERVICE>HTTP</SERVICE><PROTOCOL>2</PROTOCOL><SPORT>12345</SPORT><DPORT>99</DPORT><RULEID>60732.1</RULEID><SRCIF>5</SRCIF><COMPID>some text here</COMPID><RECEPTIONTIME>2019-04-22 15:54:29</RECEPTIONTIME><SENDERTYPE>Firewall</SENDERTYPE><SITUATION>Connection_Allowed</SITUATION><EVENTID>99999999999</EVENTID></STONEGATE_LOG>
Highlighted

Re: How to parse out fields

SplunkTrust
SplunkTrust

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

View solution in original post

Highlighted

Re: How to parse out fields

Champion

Thanks. I see them appearing on the regex site, but they don't appear as fields on the SH when I try that - are there additional steps requried?

0 Karma
Highlighted

Re: How to parse out fields

SplunkTrust
SplunkTrust

If you modified config file directly then you need to restart splunk service or you can use /debug/refresh web endpoint

0 Karma
Highlighted

Re: How to parse out fields

Champion

How will the fields appear? Will they automatically appear with the names?

0 Karma
Highlighted

Re: How to parse out fields

SplunkTrust
SplunkTrust

Yes it will automatically appear, I have tested this config in my lab and it is working fine.

0 Karma
Highlighted

Re: How to parse out fields

Ultra Champion

Interesting, so the xml doesn't have to be well-formed, as the sample above isn't well-formed.

Amazing, because back-then, a similar solution for json was a big hit here - How can we extract a json document within an event?

We ended up with -

REPORT-extract = json_embedded


[json_embedded]
REGEX = "(\w+)"."(\S+?)"
FORMAT = $1::$2
0 Karma
Highlighted

Re: How to parse out fields

SplunkTrust
SplunkTrust

Yes you can use regex for magic 😉

0 Karma
Highlighted

Re: How to parse out fields

Esteemed Legend

All these answers are missing this setting in transforms.conf:

MV_ADD = true

So the full stanza is:

[YourNameHere]
REGEX = <([^\/][^>]+)>(.*?)<\/[^>]+>
FORMAT = $1::$2
MV_ADD = true
0 Karma
Highlighted

Re: How to parse out fields

SplunkTrust
SplunkTrust

This will not work because REPEAT_MATCH is only valid for Indexed-time field extraction and solution which I have provided is for search time extraction.

0 Karma