Getting Data In

How to configure Splunk to read XML files correctly?

darlynna
Engager

I got a problem getting splunk to read my XML files correctly.
Example on one of my XML files:

http://imgur.com/RTlYiLy

I want splunk to create a event for every row(the element)
and every event should contain information on which
table it's from. I've tried to do this in some different ways
but none seem to affect splunk.
My latest attempt was to edit props.conf:

[xml-too_small]
 DATETIME_CONFIG = CURRENT
 KV_MODE = xml
 SHOULD_LINEMERGE = True
 BREAK_ONLY_BEFORE = row(surrounded with <>)// Had problems writing html tags 
MUST_BREAK_AFTER = table(surrounded with <>) | /row(surrounded with <>)
 TRUNCATE = 0
 FIELDALIAS-rootfields = table{@name} as Table table.row{@name} as Row table.row.value{@name} as Valuename table.row.value as Value

and to add queue = parsingQueue in inputs.conf.

Thank you!
Darlynna

1 Solution

lguinn2
Legend

You could do this

[myXML]
DATETIME_CONFIG = CURRENT
KV_MODE = xml
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = \<row>
TRUNCATE = 0

This will give you one event per row element. However, there is no way to add in the table info. A couple of tips:

If you specify BREAK_ONLY_BEFORE, then you shouldn't specify any other breaking criteria.

The < is a special character in regular expressions. You should really escape it with a \ as I did, although I think Splunk may not require this.

Unless you have some compelling reason (which you need to explain), you should not specify the parsingQueue.

If the source file name contains the name of the table, I would definitely use that. Keep the same props.conf as above, but add one more line:

TRANSFORMS-myxml=extract-table-name

and create transforms.conf like this

[extract-table-name]
SOURCE_KEY=MetaData:Source
REGEX=firstpartoffilename(\S+?)\.xml
FORMAT=table::$1
WRITE_META = true

Note that you will need to change the REGEX so that it picks up the actual name of the table from the filename. This creates an index-time field; although I usually dislike index-time fields, this is a case where it may be needed.

View solution in original post

lguinn2
Legend

You could do this

[myXML]
DATETIME_CONFIG = CURRENT
KV_MODE = xml
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = \<row>
TRUNCATE = 0

This will give you one event per row element. However, there is no way to add in the table info. A couple of tips:

If you specify BREAK_ONLY_BEFORE, then you shouldn't specify any other breaking criteria.

The < is a special character in regular expressions. You should really escape it with a \ as I did, although I think Splunk may not require this.

Unless you have some compelling reason (which you need to explain), you should not specify the parsingQueue.

If the source file name contains the name of the table, I would definitely use that. Keep the same props.conf as above, but add one more line:

TRANSFORMS-myxml=extract-table-name

and create transforms.conf like this

[extract-table-name]
SOURCE_KEY=MetaData:Source
REGEX=firstpartoffilename(\S+?)\.xml
FORMAT=table::$1
WRITE_META = true

Note that you will need to change the REGEX so that it picks up the actual name of the table from the filename. This creates an index-time field; although I usually dislike index-time fields, this is a case where it may be needed.

darlynna
Engager

Huge thanks!:)

0 Karma

oflyt
New Member

Thank you Iguinn 😄

0 Karma

somesoni2
SplunkTrust
SplunkTrust

At index time, Splitting each row as separate event will be easy but adding the table name would be tough (at least I don't way to do that yet). Any chance you can exclude this requirement?

0 Karma

oflyt
New Member

The file also has the name of the table, maybe it would ,be easier to make use of that? 😜

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...