Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

ITWhisperer
SplunkTrust
SplunkTrust

This challenge was first posted on Slack #puzzles channel

For a previous puzzle, I needed a set of fixed-length pipe-delimited events, so I took a public domain data set, which happened to be in XML format and converted it to the required format. The overall aim of this puzzle is to convert XML event to fixed-length events, and it has been split into multiple parts. This first part is about preparing the field template so that it can be used to place the data in the correct order in the fixed-length (and pipe-delimited) events. To that end, the challenge for this part is to take some XML events and determine the correct order that the fields appear in. The approach requires determining, for each event, where each field exists in the sequence and comparing it with the position of every other field by dereferencing the field names to find their positions. An example would be the following set of events:

<row num="1600"><Mercury>0</Mercury><Venus>0</Venus><Earth>1</Earth></row>
<row num="1625"><Jupiter>97</Jupiter></row>
<row num="1675"><Saturn>274</Saturn></row>
<row num="1800"><Saturn>274</Saturn><Uranus>29</Uranus></row>
<row num="1850"><Saturn>274</Saturn><Neptune>16</Neptune></row>
<row num="1875"><Uranus>29</Uranus></row>
<row num="1900"><Earth>1</Earth><Mars>2</Mars><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num="1950"><Jupiter>97</Jupiter><Uranus>29</Uranus><Neptune>16</Neptune></row>
<row num="1975"><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num="2000"><Jupiter>97</Jupiter><Saturn>274</Saturn><Uranus>29</Uranus><Neptune>16</Neptune></row>

For example, Mercury has a lower index in the sequence, i.e. is to the left of Venus, which has a lower index in the sequence, i.e. is to the left of Earth, and, in a different sequence, Jupiter has a lower index, i.e. is to the left of Saturn.

Create a tilde-delimited template for all the fields in this set of XML events.

For a partial example

Mercury~Venus~Earth

An optional bonus question is to explain what the data represents.

This article contains spoilers!

In fact, most of this article is a spoiler as it contains partial solutions to the puzzle. If you are trying to solve the puzzle yourself and just want some pointers to get you started, stop reading when you have enough, and return if you get stuck again, or just want to compare your solution to mine!

Preparing the field template

This puzzle has been split into multiple parts. This first part is about preparing the field template  so that it can be used to place the data in the correct order in the fixed-length (and pipe-delimited) events. To that end, the challenge for this part is to take some XML events and determine the correct order that the fields appear in. A simple example would be the following set of events:

<row num="1"><john>L</john><paul>M</paul><ringo>S</ringo></row>
<row num="2"><john>L</john><george>H</george><ringo>S</ringo></row>
<row num="3"><john>L</john><paul>M</paul><george>H</george></row>

Here, john has a lower index in the sequence, i.e. is to the left of paul, and paul has a lower index in the sequence, i.e. is to the left of ringo.

A more complex example would be the following set of events:

<row num="1600"><Mercury>0</Mercury><Venus>0</Venus><Earth>1</Earth></row>
<row num="1625"><Jupiter>97</Jupiter></row>
<row num="1675"><Saturn>274</Saturn></row>
<row num="1800"><Saturn>274</Saturn><Uranus>29</Uranus></row>
<row num="1850"><Saturn>274</Saturn><Neptune>16</Neptune></row>
<row num="1875"><Uranus>29</Uranus></row>
<row num="1900"><Earth>1</Earth><Mars>2</Mars><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num="1950"><Jupiter>97</Jupiter><Uranus>29</Uranus><Neptune>16</Neptune></row>
<row num="1975"><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num="2000"><Jupiter>97</Jupiter><Saturn>274</Saturn><Uranus>29</Uranus><Neptune>16</Neptune></row>

For example, Mercury has a lower index in the sequence, i.e. is to the left of Venus which has a lower index in the sequence, i.e. is to the left of Earth, and Jupiter has a lower index in the sequence, i.e. is to the left of Saturn. Create a tilde-delimited template for the fields in these sets of XML events.

Field names

By inspecting the data set, we can see that all the first level fields that we are interested in have simple word character names. This means that they can be extracted using a regular expression.

| rex max_match=0 "\<(?<fields>\w+)\>"

This creates a multi-value field (fields) with the fieldnames in the order they are found in the event. By joining these values (using a neutral delimiter, I chose a tilde (~) so it does not cause complications with regular expressions further down the line), we can create an initial template for each event.

| eval fieldnames=mvjoin(fields,"~")

Looking at these fieldname templates, you can see that they are not all the same. This is because not all fields are present in all the events.

Fortunately, fieldnames are not repeated in any single template, and there are enough examples of different templates to be able to determine the full and correct order. This is not always guaranteed as duplicated fieldnames and/or a smaller data set could present problems. For example, consider the following two events:

<row num="1"><john>L</john><paul>M</paul><ringo>S</ringo></row>
<row num="2"><john>L</john><george>H</george><ringo>S</ringo></row>

These would have the following templates:

john~paul~ringo
john~george~ringo

Trying to combine these sequence, which of these would represent the full template?

john~paul~george~ringo
john~george~paul~ringo

With these two events, there is insufficient information to determine this. But add a further event example:

<row num="3"><john>L</john><paul>M</paul><george>H</george></row>

With the following template:

john~paul~george

The full template can now be determined, even when none of the individual events has this complete template:

john~paul~george~ringo

Planets in order

Turning to Splunk now, by taking the planet data as an example, we can create a template for the fields present in each event:

| makeresults format=csv data="row
<row num=\"1600\"><Mercury>0</Mercury><Venus>0</Venus><Earth>1</Earth></row>
<row num=\"1625\"><Jupiter>97</Jupiter></row>
<row num=\"1675\"><Saturn>274</Saturn></row>
<row num=\"1800\"><Saturn>274</Saturn><Uranus>29</Uranus></row>
<row num=\"1850\"><Saturn>274</Saturn><Neptune>16</Neptune></row>
<row num=\"1875\"><Uranus>29</Uranus></row>
<row num=\"1900\"><Earth>1</Earth><Mars>2</Mars><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num=\"1950\"><Jupiter>97</Jupiter><Uranus>29</Uranus><Neptune>16</Neptune></row>
<row num=\"1975\"><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num=\"2000\"><Jupiter>97</Jupiter><Saturn>274</Saturn><Uranus>29</Uranus><Neptune>16</Neptune></row>"
``` Find all relevant field names ```
| rex field=row max_match=0 "\<(?<fields>\w+)\>"
``` Create a sequence of fields used in the event ```
| eval fieldnames=mvjoin(fields,"~")

This contains the following information:

fieldnames

fields

Mercury~Venus~Earth

Mercury
Venus
Earth

Jupiter

Jupiter

Saturn

Saturn

Saturn~Uranus

Saturn
Uranus

Saturn~Neptune

Saturn
Neptune

Uranus

Uranus

Earth~Mars~Jupiter~Saturn

Earth
Mars
Jupiter
Saturn

Jupiter~Uranus~Neptune

Jupiter
Uranus
Neptune

Jupiter~Saturn

Jupiter
Saturn

Jupiter~Saturn~Uranus~Neptune

Jupiter
Saturn
Uranus
Neptune

To get the names of the planets as fields, place this information in a chart with an initial value of zero:

``` Chart the fields used in the fieldname sequences ```
| eval minimum=0
| chart max(minimum) by fieldnames fields

fieldnames

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

Earth~Mars~Jupiter~Saturn

0

0

0

  

0

  

Jupiter

 

0

      

Jupiter~Saturn

 

0

   

0

  

Jupiter~Saturn~Uranus~Neptune

 

0

  

0

0

0

 

Jupiter~Uranus~Neptune

 

0

  

0

 

0

 

Mercury~Venus~Earth

0

  

0

   

0

Saturn

     

0

  

Saturn~Neptune

    

0

0

  

Saturn~Uranus

     

0

0

 

Uranus

      

0

 

Find the index of each named field in each field sequence:

``` Find the index of each field named in each sequence ```
| eval fields=split(fieldnames,"~")
| rename fieldnames as _fieldnames, fields as _fields
| foreach *
    [| eval <<FIELD>>=mvfind(_fields,"<<FIELD>>")]

Note the rename of fieldnames and fields, so that they do not get picked up by the * on the foreach command.

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

0

2

1

  

3

  
 

0

      
 

0

   

1

  
 

0

  

3

1

2

 
 

0

  

2

 

1

 

2

  

0

   

1

     

0

  
    

1

0

  
     

0

  
      

0

 

Notice that some planets have multiple entries and that not all the entries for a planet are the same. This is because the planet appears in more than one sequence and that there may be different numbers of planets appearing before it in the sequence. So, which is right? In fact, nearly all of them are wrong. How do we find the correct order?

Finding the right order

To find the right order, we can start with the maximum value for each planet, i.e. the one representing where it is furthest to the right in any sequence.

``` Find the maximum index of each field named in each sequence i.e. the furthest to the right that the field appears in any sequence ```
| eventstats max(*) as *

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

2

2

1

0

3

3

2

1

Rows removed from tables to save space.

So far, we have been dealing with the sequences as a whole; now we need to deal with each field in each sequence separately. What we would really like to do is find the value of the index for the first field named in the sequence, and add 1 to it for each of the remaining fields listed in the sequence.

Reverse dereferencing

Unfortunately, SPL does not have a direct way to dereference the field name to find the corresponding field value. So, we will have to try a different approach. Start by expanding out the field names used in each sequence so we can process them separately:

``` Split out the fieldnames used in each sequence ```
| rename _fieldnames as fieldnames, _fields as fields
| mvexpand fields

fieldnames

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

fields

Earth~Mars~Jupiter~Saturn

2

2

1

0

3

3

2

1

Earth

Earth~Mars~Jupiter~Saturn

2

2

1

0

3

3

2

1

Mars

Earth~Mars~Jupiter~Saturn

2

2

1

0

3

3

2

1

Jupiter

Earth~Mars~Jupiter~Saturn

2

2

1

0

3

3

2

1

Saturn

Find the position of each field in its sequence (as an index):

``` Find the current position of the field in its sequence ```
| streamstats count as _position by fieldnames
``` Convert to index ```
| eval _position=_position-1

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

fields

position

2

2

1

0

3

3

2

1

Earth

0

2

2

1

0

3

3

2

1

Mars

1

2

2

1

0

3

3

2

1

Jupiter

2

2

2

1

0

3

3

2

1

Saturn

3

For each sequence, find the current rightmost position of the starting field (effectively, dereferencing the first named planet field):

``` For each sequence, find the current rightmost position of the start ```
| rename fieldnames as _fieldnames, fields as _fields
| fields - _start
| foreach *
    [| eval _start=if("<<FIELD>>"=_fields AND _position=0,<<FIELD>>,_start)]
| eventstats min(_start) as _start by _fieldnames

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

fields

position

start

2

2

1

0

3

3

2

1

Earth

0

2

2

2

1

0

3

3

2

1

Mars

1

2

2

2

1

0

3

3

2

1

Jupiter

2

2

2

2

1

0

3

3

2

1

Saturn

3

2

Calculate the new position based on the relative position in the sequence and the rightmost position of the first field

``` Calculate the new position based on the relative position in the sequence and the rightmost position of the first field ```
| eval _new_position=_start+_position

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

fields

new position

position

start

2

2

1

0

3

3

2

1

Earth

2

0

2

2

2

1

0

3

3

2

1

Mars

3

1

2

2

2

1

0

3

3

2

1

Jupiter

4

2

2

2

2

1

0

3

3

2

1

Saturn

5

3

2

When the new position is greater than the field's current rightmost position, update the field's position:

``` When the new position is greater than the field's current rightmost position, update the field's position ```
| foreach *
    [| eval <<FIELD>>=if("<<FIELD>>"=_fields AND _new_position><<FIELD>>,_new_position,<<FIELD>>)]

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

fields

new position

position

start

2

2

1

0

3

3

2

1

Earth

2

0

2

2

2

3

0

3

3

2

1

Mars

3

1

2

2

4

1

0

3

3

2

1

Jupiter

4

2

2

2

2

1

0

3

5

2

1

Saturn

5

3

2

Find the new maximum index of each field named in each sequence

``` Find the new maximum index of each field named in each sequence ```
| eventstats max(*) as *

Earth

Jupiter

Mars

Mercury

Neptune

Saturn

Uranus

Venus

fields

new position

position

start

2

4

3

0

5

5

4

1

Earth

2

0

2

2

4

3

0

5

5

4

1

Mars

3

1

2

2

4

3

0

5

5

4

1

Jupiter

4

2

2

2

4

3

0

5

5

4

1

Saturn

5

3

2

Are we done yet?

There are a couple of ways to tell whether we have found the correct order yet (which can be done visually, or as a stretch exercise). Firstly, more than one planet has the same position as another, and secondly, the maximum index position is less than 7. Since we do not appear to have finished, I will leave you to work out which steps need to be repeated, and by how many times.

Generating the final template?

Once all the necessary repetitions have been done, we need to convert the information into a template. This can be done by finding the final position for each field and sorting them.

``` Find the final position for each field and sort them ```
| rename _fieldnames as fieldnames
| untable fieldnames field position
| stats values(position) as position by field
| sort 0 position

Create a template from the sorted list of fields

``` Create a template from the sorted list of fields ```
| stats list(field) as fields
| eval fields=mvjoin(fields,"~")

You should now have a process for generating the template required for the planets data set.

Have questions or thoughts? Comment on this article or in Slack #puzzles channel. Whichever you prefer.

Contributors
Get Updates on the Splunk Community!

Accelerating Observability as Code with the Splunk AI Assistant

We’ve seen in previous posts what Observability as Code (OaC) is and how it’s now essential for managing ...

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

 Splunk is More Than Just the Web Console For Digital Forensics and Incident Response (DFIR) practitioners, ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...