Splunk Search

XML Extraction -issues with field extractions using props configuration file?

SplunkDash
Motivator

Hello,

When I extract fields from the structured XML files using props.conf,  it is not extracted any key/value pairs and also headers info come as an event, how I would eliminate headers info  so it  won't show up as an event and  is there anything I am missing because of that  it's not extracting any key/value pairs .

I used

 

[sourcename]
BREAK_ONLY_BEFORE=<DSMODEL>
CHARSET=UTF-8
KV_MODE=xml
LINE_BREAKER=([\r\n]*)<DSMODEL>
MAX_TIMESTAMP_LOOKAHEAD=24
MUST_BREAK_AFTER=\/DSMODEL>
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y%m%d%H%M%S
TIME_PREFIX=<TIMESTAMP>
TRUNCATE=2500
category=Custom
disabled=false
pulldown_type=true

 

Any help will be highly appreciated. Thank you so much.

Labels (1)
Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

I've never heard of XML headers.  Please share some (sanitized) sample data so we can see what we need to process.

Also:

Don't use both BREAK_ONLY_BEFORE, MUST_BREAK_AFTER, and LINE_BREAKER together.  Try to stick with LINE_BREAKER.

The value of TIME_PREFIX must be a valid regular expression.  Test it at regex101.com.

---
If this reply helps you, Karma would be appreciated.

SplunkDash
Motivator

Hello,

Thank you so much for your quick response.

Regarding headers, every XML source file has a one header like "<xml version=1.0 encoding="ISO-88X-1>". This comes as an event in my extraction. My other issue field extraction, it's not extracting any Key/Value pairs.

Sample Event Like:

<xml version=1.0 encoding="ISO-88X-1>

<DSMODEL>

<TIMESTAMP> .......</TIMESTAMP>

........

...........

...........

</DSMODEL>

Thank you again!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Thanks for the clarification.

It appears that breaking before <DSMODEL> and after </DSMODEL> leaves the header between events so it becomes its own event.  If you use only LINE_BREAKER to break events then the header will become part of another event instead of on its own.

How many DSMODEL elements are in each XML?  If there's only one then breaking at the header should be enough.

I'm not sure why you're not getting any fields extracted.  Perhaps the XML is not well formatted.  Have you tried using the xmlkv command?

---
If this reply helps you, Karma would be appreciated.

SplunkDash
Motivator

Hello,

Thank you for your quick response. Regarding Key/Value pairs, I tested in my own local SPLUNK platform, getting Key/Value pairs  using exactly the same props.conf file. But when I implement it in Client environment not have any Key/Value pairs.  It has 5 <DSMODEL> in each XML file.  Should I use INDEXED_EXTRACTIONS=xml? Thank you again!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Using INDEXED_EXTRACTIONS=xml will increase the storage costs and slow down indexing.  It shouldn't make a difference.

That it works in your test environment, but not in production means we need to look at the differences between test and prod.  Have you run btool in prod to see what settings are there for the sourcetype?

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...