Getting Data In

How to configure Splunk to read XML files correctly?

darlynna
Engager

I got a problem getting splunk to read my XML files correctly.
Example on one of my XML files:

http://imgur.com/RTlYiLy

I want splunk to create a event for every row(the element)
and every event should contain information on which
table it's from. I've tried to do this in some different ways
but none seem to affect splunk.
My latest attempt was to edit props.conf:

[xml-too_small]
 DATETIME_CONFIG = CURRENT
 KV_MODE = xml
 SHOULD_LINEMERGE = True
 BREAK_ONLY_BEFORE = row(surrounded with <>)// Had problems writing html tags 
MUST_BREAK_AFTER = table(surrounded with <>) | /row(surrounded with <>)
 TRUNCATE = 0
 FIELDALIAS-rootfields = table{@name} as Table table.row{@name} as Row table.row.value{@name} as Valuename table.row.value as Value

and to add queue = parsingQueue in inputs.conf.

Thank you!
Darlynna

1 Solution

lguinn2
Legend

You could do this

[myXML]
DATETIME_CONFIG = CURRENT
KV_MODE = xml
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = \<row>
TRUNCATE = 0

This will give you one event per row element. However, there is no way to add in the table info. A couple of tips:

If you specify BREAK_ONLY_BEFORE, then you shouldn't specify any other breaking criteria.

The < is a special character in regular expressions. You should really escape it with a \ as I did, although I think Splunk may not require this.

Unless you have some compelling reason (which you need to explain), you should not specify the parsingQueue.

If the source file name contains the name of the table, I would definitely use that. Keep the same props.conf as above, but add one more line:

TRANSFORMS-myxml=extract-table-name

and create transforms.conf like this

[extract-table-name]
SOURCE_KEY=MetaData:Source
REGEX=firstpartoffilename(\S+?)\.xml
FORMAT=table::$1
WRITE_META = true

Note that you will need to change the REGEX so that it picks up the actual name of the table from the filename. This creates an index-time field; although I usually dislike index-time fields, this is a case where it may be needed.

View solution in original post

lguinn2
Legend

You could do this

[myXML]
DATETIME_CONFIG = CURRENT
KV_MODE = xml
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = \<row>
TRUNCATE = 0

This will give you one event per row element. However, there is no way to add in the table info. A couple of tips:

If you specify BREAK_ONLY_BEFORE, then you shouldn't specify any other breaking criteria.

The < is a special character in regular expressions. You should really escape it with a \ as I did, although I think Splunk may not require this.

Unless you have some compelling reason (which you need to explain), you should not specify the parsingQueue.

If the source file name contains the name of the table, I would definitely use that. Keep the same props.conf as above, but add one more line:

TRANSFORMS-myxml=extract-table-name

and create transforms.conf like this

[extract-table-name]
SOURCE_KEY=MetaData:Source
REGEX=firstpartoffilename(\S+?)\.xml
FORMAT=table::$1
WRITE_META = true

Note that you will need to change the REGEX so that it picks up the actual name of the table from the filename. This creates an index-time field; although I usually dislike index-time fields, this is a case where it may be needed.

darlynna
Engager

Huge thanks!:)

0 Karma

oflyt
New Member

Thank you Iguinn 😄

0 Karma

somesoni2
Revered Legend

At index time, Splitting each row as separate event will be easy but adding the table name would be tough (at least I don't way to do that yet). Any chance you can exclude this requirement?

0 Karma

oflyt
New Member

The file also has the name of the table, maybe it would ,be easier to make use of that? 😜

0 Karma
Get Updates on the Splunk Community!

Uncovering Multi-Account Fraud with Splunk Banking Analytics

Last month, I met with a Senior Fraud Analyst at a nationally recognized bank to discuss their recent success ...

Secure Your Future: A Deep Dive into the Compliance and Security Enhancements for the ...

What has been announced?  In the blog, “Preparing your Splunk Environment for OpensSSL3,”we announced the ...

New This Month in Splunk Observability Cloud - Synthetic Monitoring updates, UI ...

This month, we’re delivering several platform, infrastructure, application and digital experience monitoring ...