Splunk Search

How to configure Splunk to parse and extract fields from my pseudo-XML sample data?

DMohn
Motivator

Hi Splunkers,

I have a question regarding the input extraction of XML fields (with inputs and transforms).
I have tried to follow the advice in this post:
https://answers.splunk.com/answers/683/xml-input-line-breaking-and-field-extraction-how.html
but have not been successful yet, since the XML-structure of my data is somehow different.

Here's the data:

<ClientStatistics refDate="2015-11-10T09:47:46.888+01:00"><RequestStatistics><Client created="2015-09-10T23:25:17.523+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:45.279+01:00" pollCount="3342838" pollThroughput="1563"/><Client created="2015-09-10T23:25:21.751+02:00" id="IDxxxx" lastPoll="2015-11-10T09:46:02.196+01:00" pollCount="45031" pollThroughput="116030"/><Client created="2015-09-10T23:25:30.007+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:46.850+01:00" pollCount="16640185" pollThroughput="314"/><Client created="2015-09-10T23:25:17.516+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:46.432+01:00" lastPush="2015-11-10T09:47:46.360+01:00" pollCount="40604184" pollThroughput="129" pushCount="11646891" pushThroughput="449"/><Client created="2015-09-17T11:13:03.268+02:00" id="IDxxxx" lastPoll="2015-09-17T11:29:03.415+02:00" pollCount="9" pollThroughput="120018"/><Client created="2015-09-17T11:16:03.552+02:00" id="IDxxxx" lastPoll="2015-11-09T08:02:02.497+01:00" pollCount="300" pollThroughput="15237597"/></RequestStatistics></ClientStatistics>

Yes, it's pretty unstructured, and it's not clean XML...

I have tried to put KV-MODE = xml in my inputs.conf, with no effect. Also, the other suggested setting, like BREAK_ONLY_BEFORE or LINE_BREAKER did not split my events.

I understand, that there should be the possibility to extract the KV-pairs inside the <Client> Tags somehow, maybe with an additional transform command. I figured it sould be REGEX = (\w+)="([^"]+)" and FORMAT = $1::$2 inside transforms.conf - but I am missing the connection.

Can somebody please enlight me?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

At the risk of duplicating what you've already tried, try these props.conf settings.

SHOULD_LINEMERGE=false
LINE_BREAKER=(><)
TIME_PREFIX=Client created=
---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

At the risk of duplicating what you've already tried, try these props.conf settings.

SHOULD_LINEMERGE=false
LINE_BREAKER=(><)
TIME_PREFIX=Client created=
---
If this reply helps you, Karma would be appreciated.

DMohn
Motivator

Thanks a ton - this was a setting I actually didn't try yet 🙂

With one small modification (stripping the closing slash as well) it works perfectly!

 SHOULD_LINEMERGE=false
 LINE_BREAKER=(/><)
 TIME_PREFIX=refDate=
0 Karma

richgalloway
SplunkTrust
SplunkTrust

What values of BREAK_ONLY_BEFORE and LINE_BREAKER have you tried?

---
If this reply helps you, Karma would be appreciated.
0 Karma

DMohn
Motivator

I have tried numerous versions of RegExes, started with a simple '<', '

0 Karma
Get Updates on the Splunk Community!

Set Up More Secure Configurations in Splunk Enterprise With Config Assist

This blog post is part 3 of 4 of a series on Splunk Assist. Click the links below to see the other ...

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...

Enterprise Security Content Update (ESCU) v3.54.0

The Splunk Threat Research Team (STRT) recently released Enterprise Security Content Update (ESCU) v3.54.0 and ...