- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to index an XML file which looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<Posts2Votes>
<row>
<Id>1</Id>
<PostId>7</PostId>
<UserId>2</UserId>
<VoteTypeId>2</VoteTypeId>
<CreationDate>2009-11-06T02:22:37.063</CreationDate>
<TargetUserId>7</TargetUserId>
<TargetRepChange>10</TargetRepChange>
<IPAddress>64.127.105.60</IPAddress>
</row>
<row>
<Id>2</Id>
<PostId>6</PostId>
<UserId>2</UserId>
<VoteTypeId>2</VoteTypeId>
<CreationDate>2009-11-06T02:22:38.25</CreationDate>
<TargetUserId>31</TargetUserId>
<TargetRepChange>10</TargetRepChange>
<IPAddress>64.127.105.60</IPAddress>
</row>
<!-- more "row" elements go here -->
</Posts2Votes>
Splunk's default parser will recognizes the timestamps correctly but does not split the events on each <row>
element, and no fields are extracted by default. OK, now I need to figure out how to extract these fields and break the lines correctly. Any ideas?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
props.conf
TIME_PREFIX = \<CreationDate\>
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
SHOULD_LINEMERGE = false
LINE_BREAKER = \>\s*(?=\<row\>)
REPORT-xmlext = xml-extr
transforms.conf
[xml-extr]
REGEX = \<(\w+)\>([^\>]*)\<\1\>
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true
should do it.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
props.conf
TIME_PREFIX = \<CreationDate\>
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
SHOULD_LINEMERGE = false
LINE_BREAKER = \>\s*(?=\<row\>)
REPORT-xmlext = xml-extr
transforms.conf
[xml-extr]
REGEX = \<(\w+)\>([^\>]*)\<\1\>
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true
should do it.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. This is a very helpful post. The documentation really should be a lot more newbie-friendly. Thanks.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is tested working:
REGEX = <([^>]+)>([^<]*)<\/\1>
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a small error in above regex, correct one is:
REGEX = \<(\w+)\>([^\<]*)\</\1\>
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Where you able to get this work? I tried it but it does not break the events from one another cleanly.
I do have a subdata within the top group, so after the row group, I have a subrow that contains data for the row group, so that might be what's skewing me.