Splunk Search
Highlighted

XML input line-breaking and field extraction - how?

Contributor

I am trying to index an XML file which looks like this:

 <?xml version="1.0" encoding="utf-8" ?> 
 <Posts2Votes>
  <row>
   <Id>1</Id> 
   <PostId>7</PostId> 
   <UserId>2</UserId> 
   <VoteTypeId>2</VoteTypeId> 
   <CreationDate>2009-11-06T02:22:37.063</CreationDate> 
   <TargetUserId>7</TargetUserId> 
   <TargetRepChange>10</TargetRepChange> 
   <IPAddress>64.127.105.60</IPAddress> 
  </row>
  <row>
   <Id>2</Id> 
   <PostId>6</PostId> 
   <UserId>2</UserId> 
   <VoteTypeId>2</VoteTypeId> 
   <CreationDate>2009-11-06T02:22:38.25</CreationDate> 
   <TargetUserId>31</TargetUserId> 
   <TargetRepChange>10</TargetRepChange> 
   <IPAddress>64.127.105.60</IPAddress> 
  </row>
  <!-- more "row" elements go here -->
 </Posts2Votes>

Splunk's default parser will recognizes the timestamps correctly but does not split the events on each <row> element, and no fields are extracted by default. OK, now I need to figure out how to extract these fields and break the lines correctly. Any ideas?

Highlighted

Re: XML input line-breaking and field extraction - how?

Legend

props.conf

TIME_PREFIX = \<CreationDate\>
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
SHOULD_LINEMERGE = false
LINE_BREAKER = \>\s*(?=\<row\>)
REPORT-xmlext = xml-extr

transforms.conf

[xml-extr]
REGEX = \<(\w+)\>([^\>]*)\<\1\>
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true

should do it.

View solution in original post

Highlighted

Re: XML input line-breaking and field extraction - how?

Contributor

Where you able to get this work? I tried it but it does not break the events from one another cleanly.

I do have a subdata within the top group, so after the row group, I have a subrow that contains data for the row group, so that might be what's skewing me.

0 Karma
Highlighted

Re: XML input line-breaking and field extraction - how?

Path Finder

There is a small error in above regex, correct one is:

REGEX = \<(\w+)\>([^\<]*)\</\1\>
0 Karma
Highlighted

Re: XML input line-breaking and field extraction - how?

Esteemed Legend

This is tested working:

REGEX = <([^>]+)>([^<]*)<\/\1>
0 Karma
Highlighted

Re: XML input line-breaking and field extraction - how?

Explorer

Thanks. This is a very helpful post. The documentation really should be a lot more newbie-friendly. Thanks.

0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.