Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

ITWhisperer
SplunkTrust
SplunkTrust

This challenge was first posted on Slack #puzzles channel

For a previous puzzle, I needed a set of fixed-length pipe-delimited events, so I took a public domain data set, which happened to be in XML format and converted it to the required format. The data set I chose came from Montgomery County, Maryland, specifically, the Troubled Properties Analysis. You can retrieve an export from here. There are currently just under 700 rows in the data set. This puzzle has been split into multiple parts. This final challenge is to bring the techniques used in the earlier parts to create a single SPL search to convert the XML events into a fixed-length, pipe-delimited format, whilst maintaining the order of the fields. (You should ignore (filter out) extraneous elements and attributes.) Note that numeric fields need to be right-justified, and non-numerics are left-justified. Also, the width of the fields is just wide enough to hold the widest value for that field in the complete data set. For example:

<row _id="row-57xu.68ky~aurc" ><hlrslicensenumber>17550</hlrslicensenumber><communityname>Quebec Terrace, 1015</communityname><streetaddress>1015 QUEBEC TER</streetaddress><city>SILVER SPRING</city><zipcode>20903</zipcode><casenumber>191048</casenumber><longitude>-76.988449097</longitude><latitude>39.000106812</latitude><firstinspectiondate>2025-08-20T00:00:00</firstinspectiondate><nextinspectiondate>2026-03-20T00:00:00</nextinspectiondate><inspectionfrequency>1</inspectionfrequency><compliantind>1</compliantind><unitcount>4</unitcount><unitsinspected>5</unitsinspected><averageviolationsperunit>4.4</averageviolationsperunit><severityindex>2.22</severityindex><noviolationsobserved>0</noviolationsobserved><infestedunitspercentage>0.2</infestedunitspercentage><units_with_mold>0</units_with_mold><rating>Troubled</rating></row>

Should become something like:

| 17550|Quebec Terrace, 1015     |1015 QUEBEC TER    |SILVER SPRING|20903|191048|-76.988449097|39.000106812|2025-08-20T00:00:00|2026-03-20T00:00:00|1|1| 4|5|4.4|2.22|0|0.2|0|Troubled |

Your solution should work with a copy of the full dataset.

This article contains spoilers!

In fact, most of this article is a spoiler as it contains partial solutions to the puzzle. If you are trying to solve the puzzle yourself and just want some pointers to get you started, stop reading when you have enough, and return if you get stuck again, or just want to compare your solution to mine!

Pulling it together

This puzzle has been split into multiple parts. This final part is about creating a single search which converts the XML format data from the Montgomery County Troubled Properties Analysis data set into fixed-length (and pipe-delimited) events. Previous parts have looked at finding processes to determine a template and preparing the data so that it can be used to convert the XML data into the required fixed-length format. This final challenge is to bring the techniques used in the earlier parts to create a single SPL search to convert the XML events into a fixed-length, pipe-delimited format, whilst maintaining the order of the fields. (You should ignore (filter out) extraneous elements and attributes.) The Montgomery County Troubled Properties Analysis data set is occasionally updated, so the data set you retrieve may be slightly different to the one I used. In fact, I have two versions of the data set with slightly different but similar formats.

Field templates

Starting with the process found in part one, by changing the data used to be that from the Montgomery County Troubled Properties Analysis data set, we may find that we do not get the answer we were expecting. Working our way down the search, we soon find that the chart command needs to be modified to not use OTHER and to increase the limit of fields used. Making the following change seems to solve the problem, although the number of repetitions required for this data set may be lower than that required for the planets data set:

``` Chart the fields used in the fieldname sequences ```
| eval minimum=0
| chart max(minimum) by fieldnames fields useother=f limit=0

Switching to the process found in part two, by changing the data used to be that from the Montgomery County Troubled Properties Analysis data set, we may find that we do get the answer we were expecting, even without changing the chart command. In fact, depending on the data set you use, you may find that none of the field sequence alignment process is necessary, you may simply need to find the longest sequence and check that it contains all of the fields possible from the data set. This depends on the actual data set you are using so needs to be checked not assumed.

Reprocessing events

Both the field template discovery processes replace the data set with the template, so, how can the template be merged with the fixed-length data? For example, your fixed-length data search might look something close to this:

index="montgomery" sourcetype="troubled_properties"
| spath
| fields - row.*{@*} row.:-*
| fields row.*
| foreach row.*
    [| eval len_<<FIELD>>=len(<<FIELD>>)
    | eval padding_<<FIELD>>=if(isnum(<<FIELD>>),"","-")]
| eventstats max(len_*) as len_* min(padding_*) as padding_*
| foreach len_*
    [| eval <<FIELD>>="%".padding_<<MATCHSEG1>>.<<FIELD>>."s"
    | eval <<MATCHSEG1>>=printf(<<FIELD>>,<<MATCHSEG1>>)]

Having got the data into the required format, we now need to merge it with the template, without losing the data. Probably, the simplest way to do this is to use the appendpipe command.

| appendpipe
    [
    ``` Find all relevant field names ```
    | rex max_match=0 "\<(?<_fields>\w+)\>"
    ``` Create a tilde-delimited template based on all the events ```
    ``` I will leave you to determine what works for your data set ```
    ``` Create a pipe-delimited template field with all fieldnames in correct order ```
    | eval row="|".mvjoin(split(_sequence,"~"),"|")."|"
    ]

You should now have an additional event with the template. This template should be copied to all the events in the actual data:

``` Copy template to all data rows ```
| eventstats max(row) as row

Remove the appended event::

``` Remove appended row as no longer required ```
| where isnull(_sequence)

Now, populate the template with the field values for each event in the data set:

``` For each data field, replace fieldname in template with corresponding formatted data value ```
| foreach row.*
    [| eval row=if(match(row,"\b<<MATCHSEG1>>\b"),replace(row,"\b<<MATCHSEG1>>\b",'<<FIELD>>'),row)]

Field row now has the converted data in fixed-length and justified format:

| table row

Summary

In summary, there were a number of techniques demonstrated in this solution, reverse dereferencing, nested loops, dynamic field formatting, and reprocessing events. For me, the key to solving this puzzle, as with a lot of puzzles, was to break it down into manageable steps, before bringing the parts together for a complete solution.

Have questions or thoughts? Comment on this article or in Slack #puzzles channel. Whichever you prefer.

Contributors
Get Updates on the Splunk Community!

Painting a Clearer Picture: Creating Cross-Domain Visibility with AI Canvas

    Thursday, June 25, 2026  |  11AM PDT / 2PM EDT  Duration: 1 Hour (Includes live Q&A) Register to ...

Analytics Workspace deprecation

As of Splunk Cloud Platform 10.4.2604 and Splunk Enterprise 10.4, Analytics Workspace is now deprecated. ...

Splunk Developer Day Recap: Building, Publishing, and Growing on the Splunk Platform

Splunk Developer Day brought the Splunk developer community together for a practical look at what it means to ...