Splunk Search

Index time field extraction for XML data?

ebaileytu
Communicator

We have a use case where index time extractions for XML data makes a lot of sense yet I do not see an easy way go make it happen. I see that common fomats like csv and json as well supported but nothing for xml. Any ideas?

I see some creative work around but would prefer something more common.

The XML events are very very large so search time xmlkv is very slow. We have the indexer resources to support index time extraction.

Thanks!

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi

Can you please check answer on below link?

https://answers.splunk.com/answers/133533/xml-extraction.html?utm_source=typeahead&utm_medium=newque...

I hope it will useful to you.

Thanks
Happy Splunking

bmunson_splunk
Splunk Employee
Splunk Employee

I would question why you want it done at index time. It rarely makes a performance improvement (In fact more often makes things worse) and takes more disk space.

But if you are sure you want to try this on your development system use the above linked answer but replace the REPORT-xyz in props.conf with TRANSFORMS-xyz and add WRITE_META = true to the transforms.conf stanza.

0 Karma

ebaileytu
Communicator

This is a search time extraction so same issue as I mentioned above.

0 Karma

somesoni2
Revered Legend

Did you explore the KV_MODE = xml option in props.conf (on search heads) for automatic search-time field extraction? That will eliminate the inline search command of xmlkv.

KV_MODE = [none|auto|auto_escaped|multi|json|xml]
* Used for search-time field extractions only.
* Specifies the field/value extraction mode for the data.
* Set KV_MODE to one of the following:
  * none: if you want no field/value extraction to take place.
  * auto: extracts field/value pairs separated by equal signs.
  * auto_escaped: extracts fields/value pairs separated by equal signs and
                  honors \" and \\ as escaped sequences within quoted
                  values, e.g field="value with \"nested\" quotes"
  * multi: invokes the multikv search command to expand a tabular event into
           multiple events.
  * xml : automatically extracts fields from XML data.
  * json: automatically extracts fields from JSON data.
* Setting to 'none' can ensure that one or more user-created regexes are not
  overridden by automatic field/value extraction for a particular host,
  source, or source type, and also increases search performance.
* Defaults to auto.
* The 'xml' and 'json' modes will not extract any fields when used on data
  that isn't of the correct format (JSON or XML).
0 Karma

ebaileytu
Communicator

sure - still produces a ton overhead because it is search time. The events are between 500k and 1 million bytes each.

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...