Splunk Search

How to configure Splunk to parse and extract fields from my pseudo-XML sample data?

DMohn
Motivator

Hi Splunkers,

I have a question regarding the input extraction of XML fields (with inputs and transforms).
I have tried to follow the advice in this post:
https://answers.splunk.com/answers/683/xml-input-line-breaking-and-field-extraction-how.html
but have not been successful yet, since the XML-structure of my data is somehow different.

Here's the data:

<ClientStatistics refDate="2015-11-10T09:47:46.888+01:00"><RequestStatistics><Client created="2015-09-10T23:25:17.523+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:45.279+01:00" pollCount="3342838" pollThroughput="1563"/><Client created="2015-09-10T23:25:21.751+02:00" id="IDxxxx" lastPoll="2015-11-10T09:46:02.196+01:00" pollCount="45031" pollThroughput="116030"/><Client created="2015-09-10T23:25:30.007+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:46.850+01:00" pollCount="16640185" pollThroughput="314"/><Client created="2015-09-10T23:25:17.516+02:00" id="IDxxxx" lastPoll="2015-11-10T09:47:46.432+01:00" lastPush="2015-11-10T09:47:46.360+01:00" pollCount="40604184" pollThroughput="129" pushCount="11646891" pushThroughput="449"/><Client created="2015-09-17T11:13:03.268+02:00" id="IDxxxx" lastPoll="2015-09-17T11:29:03.415+02:00" pollCount="9" pollThroughput="120018"/><Client created="2015-09-17T11:16:03.552+02:00" id="IDxxxx" lastPoll="2015-11-09T08:02:02.497+01:00" pollCount="300" pollThroughput="15237597"/></RequestStatistics></ClientStatistics>

Yes, it's pretty unstructured, and it's not clean XML...

I have tried to put KV-MODE = xml in my inputs.conf, with no effect. Also, the other suggested setting, like BREAK_ONLY_BEFORE or LINE_BREAKER did not split my events.

I understand, that there should be the possibility to extract the KV-pairs inside the <Client> Tags somehow, maybe with an additional transform command. I figured it sould be REGEX = (\w+)="([^"]+)" and FORMAT = $1::$2 inside transforms.conf - but I am missing the connection.

Can somebody please enlight me?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

At the risk of duplicating what you've already tried, try these props.conf settings.

SHOULD_LINEMERGE=false
LINE_BREAKER=(><)
TIME_PREFIX=Client created=
---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

At the risk of duplicating what you've already tried, try these props.conf settings.

SHOULD_LINEMERGE=false
LINE_BREAKER=(><)
TIME_PREFIX=Client created=
---
If this reply helps you, Karma would be appreciated.

DMohn
Motivator

Thanks a ton - this was a setting I actually didn't try yet 🙂

With one small modification (stripping the closing slash as well) it works perfectly!

 SHOULD_LINEMERGE=false
 LINE_BREAKER=(/><)
 TIME_PREFIX=refDate=
0 Karma

richgalloway
SplunkTrust
SplunkTrust

What values of BREAK_ONLY_BEFORE and LINE_BREAKER have you tried?

---
If this reply helps you, Karma would be appreciated.
0 Karma

DMohn
Motivator

I have tried numerous versions of RegExes, started with a simple '<', '

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...