Getting Data In

How to parse out fields

a212830
Champion

Hi, I have an XML-like (but not proper XML) feed that I need to parse.

A sample is below, and I need to parse out each field.

Each field will not necessarily be in each event, so I need a method that will find it, without depending upon a previous field or the location within the event itself.

Can anyone help?

Apr 22 19:54:29 138.126.78.80 <STONEGATE_LOG><TIMESTAMP>2019-04-22 15:54:28</TIMESTAMP><LOGID>9999999</LOGID><NODEID>1.2.3.4</NODEID><FACILITY>Packet Filtering</FACILITY><TYPE>Notification</TYPE><EVENT>New connection</EVENT><ACTION>Allow</ACTION><SRC>4.5.6.7</SRC><DST>X.X.X.X</DST><SERVICE>HTTP</SERVICE><PROTOCOL>2</PROTOCOL><SPORT>12345</SPORT><DPORT>99</DPORT><RULEID>60732.1</RULEID><SRCIF>5</SRCIF><COMPID>some text here</COMPID><RECEPTIONTIME>2019-04-22 15:54:29</RECEPTIONTIME><SENDERTYPE>Firewall</SENDERTYPE><SITUATION>Connection_Allowed</SITUATION><EVENTID>99999999999</EVENTID></STONEGATE_LOG>
1 Solution

harsmarvania57
Ultra Champion

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

View solution in original post

woodcock
Esteemed Legend

All these answers are missing this setting in transforms.conf:

MV_ADD = true

So the full stanza is:

[YourNameHere]
REGEX = <([^\/][^>]+)>(.*?)<\/[^>]+>
FORMAT = $1::$2
MV_ADD = true
0 Karma

harsmarvania57
Ultra Champion

This will not work because REPEAT_MATCH is only valid for Indexed-time field extraction and solution which I have provided is for search time extraction.

0 Karma

woodcock
Esteemed Legend

Quite correct; I always get MV_ADD and REPEAT_MATCH confused. I have corrected my answer.

0 Karma

a212830
Champion

Thanks. This works quite well. Is there anyway of forcing field names to be lowercase?

0 Karma

woodcock
Esteemed Legend

You will have to stack a calculated field on top of this using lower(fieldname).

0 Karma

sloshburch
Splunk Employee
Splunk Employee

I expect that a props.conf entry for calculated field would work with eval's lower()

0 Karma

harsmarvania57
Ultra Champion

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

ddrillic
Ultra Champion

Interesting, so the xml doesn't have to be well-formed, as the sample above isn't well-formed.

Amazing, because back-then, a similar solution for json was a big hit here - How can we extract a json document within an event?

We ended up with -

REPORT-extract = json_embedded


[json_embedded]
REGEX = "(\w+)"."(\S+?)"
FORMAT = $1::$2
0 Karma

harsmarvania57
Ultra Champion

Yes you can use regex for magic 😉

0 Karma

a212830
Champion

Thanks. I see them appearing on the regex site, but they don't appear as fields on the SH when I try that - are there additional steps requried?

0 Karma

harsmarvania57
Ultra Champion

If you modified config file directly then you need to restart splunk service or you can use /debug/refresh web endpoint

0 Karma

a212830
Champion

How will the fields appear? Will they automatically appear with the names?

0 Karma

harsmarvania57
Ultra Champion

Yes it will automatically appear, I have tested this config in my lab and it is working fine.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...