Getting Data In

How to parse out fields

a212830
Champion

Hi, I have an XML-like (but not proper XML) feed that I need to parse.

A sample is below, and I need to parse out each field.

Each field will not necessarily be in each event, so I need a method that will find it, without depending upon a previous field or the location within the event itself.

Can anyone help?

Apr 22 19:54:29 138.126.78.80 <STONEGATE_LOG><TIMESTAMP>2019-04-22 15:54:28</TIMESTAMP><LOGID>9999999</LOGID><NODEID>1.2.3.4</NODEID><FACILITY>Packet Filtering</FACILITY><TYPE>Notification</TYPE><EVENT>New connection</EVENT><ACTION>Allow</ACTION><SRC>4.5.6.7</SRC><DST>X.X.X.X</DST><SERVICE>HTTP</SERVICE><PROTOCOL>2</PROTOCOL><SPORT>12345</SPORT><DPORT>99</DPORT><RULEID>60732.1</RULEID><SRCIF>5</SRCIF><COMPID>some text here</COMPID><RECEPTIONTIME>2019-04-22 15:54:29</RECEPTIONTIME><SENDERTYPE>Firewall</SENDERTYPE><SITUATION>Connection_Allowed</SITUATION><EVENTID>99999999999</EVENTID></STONEGATE_LOG>
1 Solution

harsmarvania57
Ultra Champion

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

View solution in original post

woodcock
Esteemed Legend

All these answers are missing this setting in transforms.conf:

MV_ADD = true

So the full stanza is:

[YourNameHere]
REGEX = <([^\/][^>]+)>(.*?)<\/[^>]+>
FORMAT = $1::$2
MV_ADD = true
0 Karma

harsmarvania57
Ultra Champion

This will not work because REPEAT_MATCH is only valid for Indexed-time field extraction and solution which I have provided is for search time extraction.

0 Karma

woodcock
Esteemed Legend

Quite correct; I always get MV_ADD and REPEAT_MATCH confused. I have corrected my answer.

0 Karma

a212830
Champion

Thanks. This works quite well. Is there anyway of forcing field names to be lowercase?

0 Karma

woodcock
Esteemed Legend

You will have to stack a calculated field on top of this using lower(fieldname).

0 Karma

sloshburch
Ultra Champion

I expect that a props.conf entry for calculated field would work with eval's lower()

0 Karma

harsmarvania57
Ultra Champion

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

ddrillic
Ultra Champion

Interesting, so the xml doesn't have to be well-formed, as the sample above isn't well-formed.

Amazing, because back-then, a similar solution for json was a big hit here - How can we extract a json document within an event?

We ended up with -

REPORT-extract = json_embedded


[json_embedded]
REGEX = "(\w+)"."(\S+?)"
FORMAT = $1::$2
0 Karma

harsmarvania57
Ultra Champion

Yes you can use regex for magic 😉

0 Karma

a212830
Champion

Thanks. I see them appearing on the regex site, but they don't appear as fields on the SH when I try that - are there additional steps requried?

0 Karma

harsmarvania57
Ultra Champion

If you modified config file directly then you need to restart splunk service or you can use /debug/refresh web endpoint

0 Karma

a212830
Champion

How will the fields appear? Will they automatically appear with the names?

0 Karma

harsmarvania57
Ultra Champion

Yes it will automatically appear, I have tested this config in my lab and it is working fine.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...