This challenge was first posted on Slack #puzzles channel For a previous puzzle, I needed a set of fixed-length pipe-delimited events, so I took a public domain data set, which happened to be in XML format and converted it to the required format. The overall aim of this puzzle is to convert XML event to fixed-length events, and it has been split into multiple parts. The first part was about preparing the field template by dereferencing the field names, so that their positions could be compared. This second part is about an alternative approach to the field template process. To that end, the challenge for this part is to take some XML events and determine the correct order that the fields appear in by using nested loops to process each sequence segment against all the other sequences, and merging or joining the sequence segments until the whole sequence is determined. Using the same example set of events from the previous part: <row num="1600"><Mercury>0</Mercury><Venus>0</Venus><Earth>1</Earth></row>
<row num="1625"><Jupiter>97</Jupiter></row>
<row num="1675"><Saturn>274</Saturn></row>
<row num="1800"><Saturn>274</Saturn><Uranus>29</Uranus></row>
<row num="1850"><Saturn>274</Saturn><Neptune>16</Neptune></row>
<row num="1875"><Uranus>29</Uranus></row>
<row num="1900"><Earth>1</Earth><Mars>2</Mars><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num="1950"><Jupiter>97</Jupiter><Uranus>29</Uranus><Neptune>16</Neptune></row>
<row num="1975"><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num="2000"><Jupiter>97</Jupiter><Saturn>274</Saturn><Uranus>29</Uranus><Neptune>16</Neptune></row> Develop a process to join sequences where they start and end with the same planet, and expand sequences where another sequence has one or more planets between a pair of planets which are consecutive in the sequence. Create a tilde-delimited template for the fields in this set of XML events. This article contains spoilers! In fact, most of this article is a spoiler as it contains partial solutions to the puzzle. If you are trying to solve the puzzle yourself and just want some pointers to get you started, stop reading when you have enough, and return if you get stuck again, or just want to compare your solution to mine! Field sequences This puzzle has been split into multiple parts. This second part is about an alternative approach to the field template process. Using the same example set of events from the previous part: <row num="1600"><Mercury>0</Mercury><Venus>0</Venus><Earth>1</Earth></row>
<row num="1625"><Jupiter>97</Jupiter></row>
<row num="1675"><Saturn>274</Saturn></row>
<row num="1800"><Saturn>274</Saturn><Uranus>29</Uranus></row>
<row num="1850"><Saturn>274</Saturn><Neptune>16</Neptune></row>
<row num="1875"><Uranus>29</Uranus></row>
<row num="1900"><Earth>1</Earth><Mars>2</Mars><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num="1950"><Jupiter>97</Jupiter><Uranus>29</Uranus><Neptune>16</Neptune></row>
<row num="1975"><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num="2000"><Jupiter>97</Jupiter><Saturn>274</Saturn><Uranus>29</Uranus><Neptune>16</Neptune></row> We can see that row 1600 ends with Earth, and row 1900 starts with Earth. These can be combined to have Mercury -> Venus -> Earth -> Mars -> Jupiter -> Saturn. Furthermore, row 1850 shows Saturn -> Neptune; this can be expanded to be Saturn -> Uranus -> Neptune, because row 2000 shows Jupiter -> Saturn -> Uranus -> Neptune. To that end, the challenge for this part is to find all the possible sequences of the field names without compromising the overall integrity of the series of sequences, and combine and expand overlaps. Planets in order As we did in the previous part, using the planet data, we can create a template for the fields present in each event: | makeresults format=csv data="row
<row num=\"1600\"><Mercury>0</Mercury><Venus>0</Venus><Earth>1</Earth></row>
<row num=\"1625\"><Jupiter>97</Jupiter></row>
<row num=\"1675\"><Saturn>274</Saturn></row>
<row num=\"1800\"><Saturn>274</Saturn><Uranus>29</Uranus></row>
<row num=\"1850\"><Saturn>274</Saturn><Neptune>16</Neptune></row>
<row num=\"1875\"><Uranus>29</Uranus></row>
<row num=\"1900\"><Earth>1</Earth><Mars>2</Mars><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num=\"1950\"><Jupiter>97</Jupiter><Uranus>29</Uranus><Neptune>16</Neptune></row>
<row num=\"1975\"><Jupiter>97</Jupiter><Saturn>274</Saturn></row>
<row num=\"2000\"><Jupiter>97</Jupiter><Saturn>274</Saturn><Uranus>29</Uranus><Neptune>16</Neptune></row>"
``` Create a sequence of fields used in the event ```
| eval fieldnames=mvjoin(fields,"~") Field combinations Ideally, we would like to process each field name (planet) combined with every other field name (planet) to check if this combination is found in the data. If this was written in pseudo-code, it might look something like this: For x in planets
For y in planets
Does x followed by y exist is the data The closest thing to this (without writing a custom search command), is to use the foreach command. This command requires a field list, so let us generate a set of fields for the planets. Perhaps the simplest way to do this is with the chart command: ``` Create a set of fields for the planets ```
| chart values(eval(0)) by fieldnames fields This gives us a zero every time the planet represented by the field is used in the field name sequence, but more importantly, it gives us a field named for each planet. Now we can try to process each planet against every other planet: ``` Create a cross-product of the planets ```
| foreach *
[| eval planet_one="<<FIELD>>"
| foreach *
[| eval planet_two="<<FIELD>>"
| eval cross_product=mvappend(cross_product,planet_one."~".planet_two)]
] As you can see, this does not work as we would have liked. There are two problems: firstly, the fieldnames field has been processed; secondly, and more frustratingly, the nested foreach does not override the value of the <<FIELD>> variable (you might call this scope-bleed or globalisation of <<FIELD>> variable?). The following SPL is a simple test to demonstrate what is going on: ``` Create a cross-product of the planets ```
| foreach *
[| eval planet_one="<<FIELD>>"
| foreach 1 2 3
[| eval planet_two="<<FIELD>>"
| eval cross_product=mvappend(cross_product,planet_one."~".planet_two)]
] There are three repetitions for each of the fields with both planet_one and planet_two being identical (and using the value of the <<FIELD>> variable from the outer foreach). Nested loops Since we want to process all the other planets, perhaps it would be better to replace the zero with a complete list of the planets: ``` Create a set of fields for the planets ```
| eventstats values(fields) as field
| chart values(field) by fieldnames fields This gives us a list of all the planets every time the planet represented by the field is used in the field name sequence. Now we can use foreach in multivalue mode to process each field against all the other fields. ``` Rename fieldnames so it is not picked up by foreach ```
| rename fieldnames as _fieldnames
``` For each field ... ```
| foreach *
[
``` (Nested) for each value in the multivalue field ... ```
| foreach mode=multivalue <<FIELD>>
[ Now we can check for sequences of field names and add them to a list of sequences, if either the field names directly or indirectly follow each other. ``` Build a list of field name sequences present in the current sequence with the following conditions:
i) outer fieldname does not match inner fieldname, and
a) outer fieldname is directly followed by inner fieldname (add direct pairing to the list), or
b) outer fieldname is indirectly followed by inner fieldname (add all intervening fieldnames) ```
| eval sequence=if("<<FIELD>>”=<<ITEM>>, sequence, if(match(_fieldnames,"<<FIELD>>"."~".<<ITEM>>), mvappend(sequence,"<<FIELD>>"."~".<<ITEM>>), if(match(_fieldnames,"<<FIELD>>"."~[\w~]+~".<<ITEM>>), mvappend(sequence,mvjoin(mvindex(split(_fieldnames,"~"),mvfind(split(_fieldnames,"~"),"<<FIELD>>"),mvfind(split(_fieldnames,"~"),<<ITEM>>)),"~")), sequence))) Note that, as we discovered earlier, the variable <<FIELD>> in the inner foreach refers to the field from the outer foreach. Also note that, the <<ITEM>> variable is already a string value, whereas <<FIELD>> is the field value so needs to be double-quoted to use it as a string. For clarity, this could be rewritten as: ``` For each field ... ```
| foreach *
[
| eval field_one="<<FIELD>>"
``` (Nested) for each value in the multivalue field ... ```
| foreach mode=multivalue <<FIELD>>
[
``` Build a list of field name sequences present in the current sequence with the following conditions:
i) outer fieldname does not match inner fieldname, and
a) outer fieldname is directly followed by inner fieldname (add direct pairing to the list), or
b) outer fieldname is indirectly followed by inner fieldname (add all intervening fieldnames) ```
| eval field_two=<<ITEM>>, sequence=if(field_one=field_two,sequence,if(match(_fieldnames,field_one."~".field_two),mvappend(sequence,field_one."~".field_two),if(match(_fieldnames,field_one."~[\w~]+~".field_two),mvappend(sequence,mvjoin(mvindex(split(_fieldnames,"~"),mvfind(split(_fieldnames,"~"),field_one),mvfind(split(_fieldnames,"~"),field_two)),"~")),sequence)))
]
] Note that, foreach mode=multivalue only allows a single command to be used, however, multiple fields can be evaluated in a single eval command by separating them with commas. Overlaps and expansions Now that we have a list of field sequences, we need to look for opportunities to expand sequences where they end where another one starts, or start where another one ends, or where they contain a consecutive sequence which matches the start and end of another sequence. ``` Create a list of unique sequences ```
| stats count by sequence
``` Make complete list available to each sequence ```
| eventstats values(sequence) as sequences Move the current sequence out of the way so we can create a new set of sequences ``` Move existing sequence out of the way so we can create a new list ```
| rename sequence as _sequence For each of the sequences, create a new list of sequences containing, the current sequence, expanded sequences where the start of the other matches the start of the current sequence, or where the end of the other matches the start of the current sequence, or where the start and end of the other sequence match a consecutive pair in the current sequence. ``` For each sequence (event), compare all the other sequences ```
| foreach mode=multivalue sequences
[
| eval sequence=if(_sequence=<<ITEM>>,mvappend(sequence,_sequence),if(mvindex(split(<<ITEM>>,"~"),0)=mvindex(split(_sequence,"~"),-1),mvappend(sequence,_sequence."~".mvjoin(mvindex(split(<<ITEM>>,"~"),1,-1),"~")),if(mvindex(split(<<ITEM>>,"~"),-1)=mvindex(split(_sequence,"~"),0),mvappend(sequence,mvjoin(mvindex(split(<<ITEM>>,"~"),0,-2),"~")."~"._sequence),if(match(_sequence,mvindex(split(<<ITEM>>,"~"),0)."~".mvindex(split(<<ITEM>>,"~"),-1)),mvappend(sequence,replace(_sequence,mvindex(split(<<ITEM>>,"~"),0)."~".mvindex(split(<<ITEM>>,"~"),-1),<<ITEM>>)),sequence)))
] Since we do not appear to have found the full sequence yet, I will leave you to work out which steps need to be repeated, and by how many times until the complete sequence is discovered. Have questions or thoughts? Comment on this article or in Slack #puzzles channel. Whichever you prefer.
... View more