Getting Data In

Ingest XML files, fields not being created

ssaenger
Communicator

Hi,

i am trying to ingest XML files and split the elements in fields, my log files are;

<?xml version="1.0" encoding="UTF-8" standalone="no"?><SmartPanel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" DocumentCreationDate="2019-07-09T10:18:04" DocumentVersion="5" PanID="15" LogCreationDate="2019-07-08T18:45:32" TvID="0" xmlns="urn:nds:dyn:pms:Smart:v1" xsi:schemaLocation="urn:nds:dyn:pms:Smart:v1 /apps/WEB-INF/amsXmlSchema.xsd"><Subscriber SubscriberID="126" DeviceID="2915"><SmartNoSubstitution EventTime="2019-07-08T18:45:53"><availId>175696022</availId><reason>0</reason><ServiceKey>4049</ServiceKey></SmartNoSubstitution><SmartNoSubstitution EventTime="2019-07-08T18:57:05"><availId>175696024</availId><reason>0</reason><ServKey>4049</ServKey></SmartNoSubstitution></Subscriber></SmartPanel>

and

 <?xml version="1.0" encoding="UTF-8" standalone="no"?><SmartPanel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" DocumentCreationDate="2019-07-09T11:18:04" DocumentVersion="5" PanID="5" LogCreationDate="2019-07-08T19:45:32" TvID="0" xmlns="urn:nds:dyn:pms:Smart:v1" xsi:schemaLocation="urn:nds:dyn:pms:Smart:v1 /apps/WEB-INF/amsXmlSchema.xsd"><Subscriber SubscriberID="178" DeviceID="45615"></Subscriber></SmartPanel>

from other questions my props.conf and transform.conf are below
props.conf

[pms]
TIME_PREFIX=EventTime
TIME_FORMAT=%Y-%m-%dT%H:%M:%S 
SHOULD_LINEMERGE=false
TRUNCATE=100000
LINE_BREAKER=\>\s*(?=\)
REPORT-xmlext=xml-extr

and
transforms.conf

[xml-extr]
REGEX=<([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT=$1::$2 
MV_ADD=true
REPEAT_MATCH=true

however the only files being ingested are the second one and this is giving fields where there is an =

i have tried to use KV_MODE=xml but this has not helped.

i have used regex101 to validate the regex

Match 1
Full match  451-479 <availId>175696022</availId>
Group 1.    452-459 availId
Group 2.    460-469 175696022
Match 2
Full match  479-497 <reason>0</reason>
Group 1.    480-486 reason
Group 2.    487-488 0
Match 3
Full match  497-526 <ServiceKey>4049</ServiceKey>
Group 1.    498-508 ServiceKey
Group 2.    509-513 4049

does any body have any advice?

0 Karma

woodcock
Esteemed Legend

Your question is very unclear. The settings that you have will work correctly for the first case and KV_MODE=auto will work for the 2nd case. So what EXACTLY is your problem here? As far as LINE_BREAKER, we cannot help you unless you show us multiple events exactly the way that they are in the file (with all variations).

0 Karma

FrankVl
Ultra Champion

Few comments:

  • why not set TIME_PREFIX=EventTime=" (probably also works with just TIME_PREFIX=EventTime, but better be as specific as possible I would say.
  • That LINE_BREAKER seems strange. Is there something missing? It doesn't include the mandatory capture group. Your intention is to break on every <SmartNoSubstitution?
  • If you don't want automatic key=value extraction to kick in: add KV_MODE = none in props.conf
  • Where have you deployed this config? For the extractions to work, it must be on your search heads as well.
  • There is no point in specifying REPEAT_MATCH=true, since that setting only applies to index time extractions
0 Karma

ssaenger
Communicator

Hi Frank,

thanks for your comments, i have tried what you suggested however the breaks and field ingest does not work.
i still have fields that are based on elements with a =, but anything after SmartNoSubstitution is not extracting.

0 Karma

FrankVl
Ultra Champion

What linebreaker are you now using? Because what you have doesn't make much sense to me as I said and I didn't suggest anything else yet.

Then I guess the first thing to do is some troubleshooting to confirm whether Splunk is really using the configuration at all.

Check (e.g. using btool) that the indexers / heavy forwarders have the configuration for the index time things (line breaking, timestamping). Have you restarted them after making the changes? Make sure when testing that you are actually looking at freshly ingested events, otherwise you're not going to see the effect of any changes to index time config.

Check the Search Heads have the field extraction config (e.g. confirm it is present from the GUI Settings -> Fields and has appropriate permission and sharing settings to make the config available in the app where you run the search).

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...