Splunk Search

How to configure Splunk to parse and extract fields from my pseudo-XML sample data?

Motivator

Hi Splunkers,

I have a question regarding the input extraction of XML fields (with inputs and transforms).
I have tried to follow the advice in this post:
https://answers.splunk.com/answers/683/xml-input-line-breaking-and-field-extraction-how.html
but have not been successful yet, since the XML-structure of my data is somehow different.

Here's the data:

<ClientStatistics refDate="2015-11-10T09:47:46.888+01:00"><RequestStatistics><Client created="2015-09-10T23:25:17.523+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:45.279+01:00" pollCount="3342838" pollThroughput="1563"/><Client created="2015-09-10T23:25:21.751+02:00" id="IDxxxx" lastPoll="2015-11-10T09:46:02.196+01:00" pollCount="45031" pollThroughput="116030"/><Client created="2015-09-10T23:25:30.007+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:46.850+01:00" pollCount="16640185" pollThroughput="314"/><Client created="2015-09-10T23:25:17.516+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:46.432+01:00" lastPush="2015-11-10T09:47:46.360+01:00" pollCount="40604184" pollThroughput="129" pushCount="11646891" pushThroughput="449"/><Client created="2015-09-17T11:13:03.268+02:00" id="IDxxxx" lastPoll="2015-09-17T11:29:03.415+02:00" pollCount="9" pollThroughput="120018"/><Client created="2015-09-17T11:16:03.552+02:00" id="IDxxxx" lastPoll="2015-11-09T08:02:02.497+01:00" pollCount="300" pollThroughput="15237597"/></RequestStatistics></ClientStatistics>

Yes, it's pretty unstructured, and it's not clean XML...

I have tried to put KV-MODE = xml in my inputs.conf, with no effect. Also, the other suggested setting, like BREAK_ONLY_BEFORE or LINE_BREAKER did not split my events.

I understand, that there should be the possibility to extract the KV-pairs inside the <Client> Tags somehow, maybe with an additional transform command. I figured it sould be REGEX = (\w+)="([^"]+)" and FORMAT = $1::$2 inside transforms.conf - but I am missing the connection.

Can somebody please enlight me?

0 Karma
1 Solution

SplunkTrust
SplunkTrust

At the risk of duplicating what you've already tried, try these props.conf settings.

SHOULD_LINEMERGE=false
LINE_BREAKER=(><)
TIME_PREFIX=Client created=
---
If this reply helps you, an upvote would be appreciated.

View solution in original post

SplunkTrust
SplunkTrust

At the risk of duplicating what you've already tried, try these props.conf settings.

SHOULD_LINEMERGE=false
LINE_BREAKER=(><)
TIME_PREFIX=Client created=
---
If this reply helps you, an upvote would be appreciated.

View solution in original post

Motivator

Thanks a ton - this was a setting I actually didn't try yet 🙂

With one small modification (stripping the closing slash as well) it works perfectly!

 SHOULD_LINEMERGE=false
 LINE_BREAKER=(/><)
 TIME_PREFIX=refDate=
0 Karma

SplunkTrust
SplunkTrust

What values of BREAK_ONLY_BEFORE and LINE_BREAKER have you tried?

---
If this reply helps you, an upvote would be appreciated.
0 Karma

Motivator

I have tried numerous versions of RegExes, started with a simple '<', '

0 Karma