Splunk Search

XML Extraction -issues with field extractions using props configuration file?

SplunkDash
Motivator

Hello,

When I extract fields from the structured XML files using props.conf,  it is not extracted any key/value pairs and also headers info come as an event, how I would eliminate headers info  so it  won't show up as an event and  is there anything I am missing because of that  it's not extracting any key/value pairs .

I used

 

[sourcename]
BREAK_ONLY_BEFORE=<DSMODEL>
CHARSET=UTF-8
KV_MODE=xml
LINE_BREAKER=([\r\n]*)<DSMODEL>
MAX_TIMESTAMP_LOOKAHEAD=24
MUST_BREAK_AFTER=\/DSMODEL>
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y%m%d%H%M%S
TIME_PREFIX=<TIMESTAMP>
TRUNCATE=2500
category=Custom
disabled=false
pulldown_type=true

 

Any help will be highly appreciated. Thank you so much.

Labels (1)
Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

I've never heard of XML headers.  Please share some (sanitized) sample data so we can see what we need to process.

Also:

Don't use both BREAK_ONLY_BEFORE, MUST_BREAK_AFTER, and LINE_BREAKER together.  Try to stick with LINE_BREAKER.

The value of TIME_PREFIX must be a valid regular expression.  Test it at regex101.com.

---
If this reply helps you, Karma would be appreciated.

SplunkDash
Motivator

Hello,

Thank you so much for your quick response.

Regarding headers, every XML source file has a one header like "<xml version=1.0 encoding="ISO-88X-1>". This comes as an event in my extraction. My other issue field extraction, it's not extracting any Key/Value pairs.

Sample Event Like:

<xml version=1.0 encoding="ISO-88X-1>

<DSMODEL>

<TIMESTAMP> .......</TIMESTAMP>

........

...........

...........

</DSMODEL>

Thank you again!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Thanks for the clarification.

It appears that breaking before <DSMODEL> and after </DSMODEL> leaves the header between events so it becomes its own event.  If you use only LINE_BREAKER to break events then the header will become part of another event instead of on its own.

How many DSMODEL elements are in each XML?  If there's only one then breaking at the header should be enough.

I'm not sure why you're not getting any fields extracted.  Perhaps the XML is not well formatted.  Have you tried using the xmlkv command?

---
If this reply helps you, Karma would be appreciated.

SplunkDash
Motivator

Hello,

Thank you for your quick response. Regarding Key/Value pairs, I tested in my own local SPLUNK platform, getting Key/Value pairs  using exactly the same props.conf file. But when I implement it in Client environment not have any Key/Value pairs.  It has 5 <DSMODEL> in each XML file.  Should I use INDEXED_EXTRACTIONS=xml? Thank you again!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Using INDEXED_EXTRACTIONS=xml will increase the storage costs and slow down indexing.  It shouldn't make a difference.

That it works in your test environment, but not in production means we need to look at the differences between test and prod.  Have you run btool in prod to see what settings are there for the sourcetype?

---
If this reply helps you, Karma would be appreciated.

hrawat
Splunk Employee
Splunk Employee

xml is not supported for INDEXED_EXTRACTIONS. Supported types are

INDEXED_EXTRACTIONS = <CSV|TSV|PSV|W3C|JSON|HEC>
* The type of file that Splunk software should expect for a given source
type, and the extraction and/or parsing method that should be used on the
file.
* The following values are valid for 'INDEXED_EXTRACTIONS':
CSV - Comma separated value format
TSV - Tab-separated value format
PSV - pipe ("|")-separated value format
W3C - World Wide Web Consortium (W3C) Extended Log File Format
JSON - JavaScript Object Notation format
HEC - Interpret file as a stream of JSON events in the same format as the
HTTP Event Collector (HEC) input.




Get Updates on the Splunk Community!

New Year, New Changes for Splunk Certifications

As we embrace a new year, we’re making a small but important update to the Splunk Certification ...

[Puzzles] Solve, Learn, Repeat: Unmerging HTML Tables

[Puzzles] Solve, Learn, Repeat: Unmerging HTML TablesFor a previous puzzle, I needed some sample data, and ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

As of today, Enterprise Security (ES) Essentials 8.3 is now generally available, helping SOC teams simplify ...