Getting Data In

Handling breaking and field extraction from Header section plus Repeated sections in XML

rickferrante
Explorer

Hi, 
We need to forward XML documents from a UF to indexers that have key fields both in a one-time header  section and in a repeated section that can be repeated up to 100,000 times.  So, for example, the file could look like:

<PUBS>
<HEADER><Identifier>93234</Identifier>
<REPEATSECTION><Balance>8751.23</Balance></REPEATSECTION>
<REPEATSECTION><Balance>943.43</Balance></REPEATSECTION>
... note: repeats up to 100,000 times with many many more fields than shown here. Total file size >=300mb...
<REPEATSECTION><Balance>123.233</Balance></REPEATSECTION>
</PUBS>

If the UF breaks events before  <REAPEATSECTION>, then we could have one splunk event per REPEAT section but the fields in the HEADER would not be available. 

If the UF sends the whole 300mb file to an indexer,  is there a configuration of props/transforms on the indexer that can create one splunk event per REPEATSECTION but also get the fields from the HEADER section?

I'm trying to ask a good question here as best i can.  Does my question make sense to anyone?

Thanks!

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

The question is understandable. The answer however is that you can't do that reliably with splunk's built-in functionalities. Splunk processes one event at a time and doesn't keep any state which could be carried from one event to another. You can sometimes do some magic with cloning events and cutting different parts from each copy but that hack is ugly, non-scallable and inefficient.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

The question is understandable. The answer however is that you can't do that reliably with splunk's built-in functionalities. Splunk processes one event at a time and doesn't keep any state which could be carried from one event to another. You can sometimes do some magic with cloning events and cutting different parts from each copy but that hack is ugly, non-scallable and inefficient.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

Data Management Digest – May 2026

Welcome to the May 2026 edition of Data Management Digest!   As your trusted partner in data innovation, the ...