Getting Data In

How to configure Splunk to read XML files correctly?

darlynna
Engager

I got a problem getting splunk to read my XML files correctly.
Example on one of my XML files:

http://imgur.com/RTlYiLy

I want splunk to create a event for every row(the element)
and every event should contain information on which
table it's from. I've tried to do this in some different ways
but none seem to affect splunk.
My latest attempt was to edit props.conf:

[xml-too_small]
 DATETIME_CONFIG = CURRENT
 KV_MODE = xml
 SHOULD_LINEMERGE = True
 BREAK_ONLY_BEFORE = row(surrounded with <>)// Had problems writing html tags 
MUST_BREAK_AFTER = table(surrounded with <>) | /row(surrounded with <>)
 TRUNCATE = 0
 FIELDALIAS-rootfields = table{@name} as Table table.row{@name} as Row table.row.value{@name} as Valuename table.row.value as Value

and to add queue = parsingQueue in inputs.conf.

Thank you!
Darlynna

1 Solution

lguinn2
Legend

You could do this

[myXML]
DATETIME_CONFIG = CURRENT
KV_MODE = xml
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = \<row>
TRUNCATE = 0

This will give you one event per row element. However, there is no way to add in the table info. A couple of tips:

If you specify BREAK_ONLY_BEFORE, then you shouldn't specify any other breaking criteria.

The < is a special character in regular expressions. You should really escape it with a \ as I did, although I think Splunk may not require this.

Unless you have some compelling reason (which you need to explain), you should not specify the parsingQueue.

If the source file name contains the name of the table, I would definitely use that. Keep the same props.conf as above, but add one more line:

TRANSFORMS-myxml=extract-table-name

and create transforms.conf like this

[extract-table-name]
SOURCE_KEY=MetaData:Source
REGEX=firstpartoffilename(\S+?)\.xml
FORMAT=table::$1
WRITE_META = true

Note that you will need to change the REGEX so that it picks up the actual name of the table from the filename. This creates an index-time field; although I usually dislike index-time fields, this is a case where it may be needed.

View solution in original post

lguinn2
Legend

You could do this

[myXML]
DATETIME_CONFIG = CURRENT
KV_MODE = xml
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = \<row>
TRUNCATE = 0

This will give you one event per row element. However, there is no way to add in the table info. A couple of tips:

If you specify BREAK_ONLY_BEFORE, then you shouldn't specify any other breaking criteria.

The < is a special character in regular expressions. You should really escape it with a \ as I did, although I think Splunk may not require this.

Unless you have some compelling reason (which you need to explain), you should not specify the parsingQueue.

If the source file name contains the name of the table, I would definitely use that. Keep the same props.conf as above, but add one more line:

TRANSFORMS-myxml=extract-table-name

and create transforms.conf like this

[extract-table-name]
SOURCE_KEY=MetaData:Source
REGEX=firstpartoffilename(\S+?)\.xml
FORMAT=table::$1
WRITE_META = true

Note that you will need to change the REGEX so that it picks up the actual name of the table from the filename. This creates an index-time field; although I usually dislike index-time fields, this is a case where it may be needed.

View solution in original post

darlynna
Engager

Huge thanks!:)

0 Karma

oflyt
New Member

Thank you Iguinn 😄

0 Karma

somesoni2
Revered Legend

At index time, Splitting each row as separate event will be easy but adding the table name would be tough (at least I don't way to do that yet). Any chance you can exclude this requirement?

0 Karma

oflyt
New Member

The file also has the name of the table, maybe it would ,be easier to make use of that? 😜

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!