Getting Data In

Ingest XML files, fields not being created

ssaenger
Communicator

Hi,

i am trying to ingest XML files and split the elements in fields, my log files are;

<?xml version="1.0" encoding="UTF-8" standalone="no"?><SmartPanel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" DocumentCreationDate="2019-07-09T10:18:04" DocumentVersion="5" PanID="15" LogCreationDate="2019-07-08T18:45:32" TvID="0" xmlns="urn:nds:dyn:pms:Smart:v1" xsi:schemaLocation="urn:nds:dyn:pms:Smart:v1 /apps/WEB-INF/amsXmlSchema.xsd"><Subscriber SubscriberID="126" DeviceID="2915"><SmartNoSubstitution EventTime="2019-07-08T18:45:53"><availId>175696022</availId><reason>0</reason><ServiceKey>4049</ServiceKey></SmartNoSubstitution><SmartNoSubstitution EventTime="2019-07-08T18:57:05"><availId>175696024</availId><reason>0</reason><ServKey>4049</ServKey></SmartNoSubstitution></Subscriber></SmartPanel>

and

 <?xml version="1.0" encoding="UTF-8" standalone="no"?><SmartPanel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" DocumentCreationDate="2019-07-09T11:18:04" DocumentVersion="5" PanID="5" LogCreationDate="2019-07-08T19:45:32" TvID="0" xmlns="urn:nds:dyn:pms:Smart:v1" xsi:schemaLocation="urn:nds:dyn:pms:Smart:v1 /apps/WEB-INF/amsXmlSchema.xsd"><Subscriber SubscriberID="178" DeviceID="45615"></Subscriber></SmartPanel>

from other questions my props.conf and transform.conf are below
props.conf

[pms]
TIME_PREFIX=EventTime
TIME_FORMAT=%Y-%m-%dT%H:%M:%S 
SHOULD_LINEMERGE=false
TRUNCATE=100000
LINE_BREAKER=\>\s*(?=\)
REPORT-xmlext=xml-extr

and
transforms.conf

[xml-extr]
REGEX=<([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT=$1::$2 
MV_ADD=true
REPEAT_MATCH=true

however the only files being ingested are the second one and this is giving fields where there is an =

i have tried to use KV_MODE=xml but this has not helped.

i have used regex101 to validate the regex

Match 1
Full match  451-479 <availId>175696022</availId>
Group 1.    452-459 availId
Group 2.    460-469 175696022
Match 2
Full match  479-497 <reason>0</reason>
Group 1.    480-486 reason
Group 2.    487-488 0
Match 3
Full match  497-526 <ServiceKey>4049</ServiceKey>
Group 1.    498-508 ServiceKey
Group 2.    509-513 4049

does any body have any advice?

0 Karma

woodcock
Esteemed Legend

Your question is very unclear. The settings that you have will work correctly for the first case and KV_MODE=auto will work for the 2nd case. So what EXACTLY is your problem here? As far as LINE_BREAKER, we cannot help you unless you show us multiple events exactly the way that they are in the file (with all variations).

0 Karma

FrankVl
Ultra Champion

Few comments:

  • why not set TIME_PREFIX=EventTime=" (probably also works with just TIME_PREFIX=EventTime, but better be as specific as possible I would say.
  • That LINE_BREAKER seems strange. Is there something missing? It doesn't include the mandatory capture group. Your intention is to break on every <SmartNoSubstitution?
  • If you don't want automatic key=value extraction to kick in: add KV_MODE = none in props.conf
  • Where have you deployed this config? For the extractions to work, it must be on your search heads as well.
  • There is no point in specifying REPEAT_MATCH=true, since that setting only applies to index time extractions
0 Karma

ssaenger
Communicator

Hi Frank,

thanks for your comments, i have tried what you suggested however the breaks and field ingest does not work.
i still have fields that are based on elements with a =, but anything after SmartNoSubstitution is not extracting.

0 Karma

FrankVl
Ultra Champion

What linebreaker are you now using? Because what you have doesn't make much sense to me as I said and I didn't suggest anything else yet.

Then I guess the first thing to do is some troubleshooting to confirm whether Splunk is really using the configuration at all.

Check (e.g. using btool) that the indexers / heavy forwarders have the configuration for the index time things (line breaking, timestamping). Have you restarted them after making the changes? Make sure when testing that you are actually looking at freshly ingested events, otherwise you're not going to see the effect of any changes to index time config.

Check the Search Heads have the field extraction config (e.g. confirm it is present from the GUI Settings -> Fields and has appropriate permission and sharing settings to make the config available in the app where you run the search).

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...