Solved: How to parse out fields

a212830 · ‎05-01-2019

Hi, I have an XML-like (but not proper XML) feed that I need to parse.

A sample is below, and I need to parse out each field.

Each field will not necessarily be in each event, so I need a method that will find it, without depending upon a previous field or the location within the event itself.

Can anyone help?

Apr 22 19:54:29 138.126.78.80 <STONEGATE_LOG><TIMESTAMP>2019-04-22 15:54:28</TIMESTAMP><LOGID>9999999</LOGID><NODEID>1.2.3.4</NODEID><FACILITY>Packet Filtering</FACILITY><TYPE>Notification</TYPE><EVENT>New connection</EVENT><ACTION>Allow</ACTION><SRC>4.5.6.7</SRC><DST>X.X.X.X</DST><SERVICE>HTTP</SERVICE><PROTOCOL>2</PROTOCOL><SPORT>12345</SPORT><DPORT>99</DPORT><RULEID>60732.1</RULEID><SRCIF>5</SRCIF><COMPID>some text here</COMPID><RECEPTIONTIME>2019-04-22 15:54:29</RECEPTIONTIME><SENDERTYPE>Firewall</SENDERTYPE><SITUATION>Connection_Allowed</SITUATION><EVENTID>99999999999</EVENTID></STONEGATE_LOG>

harsmarvania57 · ‎05-01-2019

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

View solution in original post

woodcock · ‎05-01-2019

All these answers are missing this setting in transforms.conf:

MV_ADD = true

So the full stanza is:

[YourNameHere]
REGEX = <([^\/][^>]+)>(.*?)<\/[^>]+>
FORMAT = $1::$2
MV_ADD = true

harsmarvania57 · ‎05-01-2019

This will not work because REPEAT_MATCH is only valid for Indexed-time field extraction and solution which I have provided is for search time extraction.

woodcock · ‎05-01-2019

Quite correct; I always get MV_ADD and REPEAT_MATCH confused. I have corrected my answer.

a212830 · ‎05-06-2019

Thanks. This works quite well. Is there anyway of forcing field names to be lowercase?

woodcock · ‎06-07-2019

You will have to stack a calculated field on top of this using lower(fieldname).

sloshburch · ‎06-07-2019

I expect that a props.conf entry for calculated field would work with eval's lower()

harsmarvania57 · ‎05-01-2019

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

ddrillic · ‎05-01-2019

Interesting, so the xml doesn't have to be well-formed, as the sample above isn't well-formed.

Amazing, because back-then, a similar solution for json was a big hit here - How can we extract a json document within an event?

We ended up with -

REPORT-extract = json_embedded


[json_embedded]
REGEX = "(\w+)"."(\S+?)"
FORMAT = $1::$2

harsmarvania57 · ‎05-02-2019

Yes you can use regex for magic 😉

a212830 · ‎05-01-2019

Thanks. I see them appearing on the regex site, but they don't appear as fields on the SH when I try that - are there additional steps requried?

harsmarvania57 · ‎05-01-2019

If you modified config file directly then you need to restart splunk service or you can use /debug/refresh web endpoint

a212830 · ‎05-01-2019

How will the fields appear? Will they automatically appear with the names?

harsmarvania57 · ‎05-01-2019

Yes it will automatically appear, I have tested this config in my lab and it is working fine.

How to parse out fields

Developer Spotlight with Paul Stout

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

Data-Driven Success: Splunk & Financial Services