Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

[Puzzles] Solve, Learn, Repeat: Word Search

ITWhisperer
SplunkTrust
SplunkTrust

This challenge was first posted on Slack #puzzles channel

This puzzle is based on a letter grid containing tangled words.

ITWhisperer_0-1769088277767.png
The diagram was created in draw.io which can use XML documents to export and import diagrams.

The challenge is to process the XML document using SPL to find the tangled words.

This article contains spoilers!

In fact, the whole article is a spoiler as it contains solutions to the puzzle. If you are trying to solve the puzzle yourself and just want some pointers to get you started, stop reading when you have enough, and return if you get stuck again, or just want to compare your solution to mine!

Diagram source

There are two main types of nodes in the XML documents. Some example nodes look like this:

        <mxCell id="f1ab-4f63" edge="1" parent="1" source="f9ed-7857" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;fillColor=#88B8B8;strokeColor=#88B8B8;startArrow=none;startFill=0;endArrow=classic;endFill=1;" target="813f-6d2d" value="">
          <mxGeometry relative="1" as="geometry" />
        </mxCell>
        <mxCell id="f69a-54d7" edge="1" parent="1" source="813f-6d2d" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;fillColor=#88B8B8;strokeColor=#88B8B8;startArrow=none;startFill=0;endArrow=classic;endFill=1;" target="bdcf-6cdb" value="">
          <mxGeometry relative="1" as="geometry" />
        </mxCell>
        <mxCell id="04b0-70ca" parent="1" style="whiteSpace=wrap;html=1;aspect=fixed;fontSize=30;" value="P" vertex="1">
          <mxGeometry height="80" width="80" x="640" y="440" as="geometry" />
        </mxCell>
        <mxCell id="0cd5-73fa" parent="1" style="whiteSpace=wrap;html=1;aspect=fixed;fontSize=30;" value="S" vertex="1">
          <mxGeometry height="80" width="80" x="720" y="440" as="geometry" />
        </mxCell>
        <mxCell id="2151-6ffe" parent="1" style="whiteSpace=wrap;html=1;aspect=fixed;fontSize=30;" value="O" vertex="1">
          <mxGeometry height="80" width="80" x="720" y="360" as="geometry" />
        </mxCell>

The full document can be found here.

Parsing XML

Taking the full document as a single event, the first thing to do is to split it into separate mxCell events:

``` Use makeresults to create a copy of the whole XML document in _raw ```
``` Convert single line mxCell nodes to have closing tags ```
| rex mode=sed max_match=0 "s/(?P<mxCell>\<mxCell )(?P<attr>[^\/\>]+)\/\>/\1\2><\/mxCell>/g"
``` Extract all the mxCell nodes into separate events ```
| rex max_match=0 "(?ms)(?<mxCell>\<mxCell .+?\<\/mxCell\>)"
| fields - _raw
| mvexpand mxCell

Extract the cell id attribute and drop the header nodes (ids 0 and 1)

| spath input=mxCell output=id path=mxCell{@id}
| where len(id) > 1

Extract the source, target and value attributes and drop the mxCell field

| spath input=mxCell output=source path=mxCell{@source}
| spath input=mxCell output=target path=mxCell{@target}
| spath input=mxCell output=value path=mxCell{@value}
| fields - mxCell

This gives us a set of events where some events have source and target ids and some events have values associated with ids.

Non-lookup-self-lookup

Using a technique I like to call "non-lookup-self-lookup", we need to find the values associated with the source and target ids of each "connection" event. (We could have saved the second part of the data to a lookup file and used the lookup command to retrieve the values, but this potentially requires additional permissions and capabilities.)

The heart of this technique is to triplicate the events so that the three ids can be stored in the same field for the three different events.

| eval row=mvrange(0,3)
| mvexpand row
| eval name=case(row=0, id, row=1, source, row=2, target)

Now we can use eventstats to gather the list of values associated with the source and target names

| eventstats list(eval(if(value != "",value, null()))) as letters by name

Index-based processing loop

We can use eventstats again to gather the list of letters associated with each connection, drop the redundant rows, and create a 2-letter word fragment

| eventstats list(letters) as letters by id
| where row=0
| eval letters=if(mvcount(letters)>1,mvjoin(letters,""),null())

We use stats to create lists of word fragments and their associated source and target. We use list() rather than values() to preserve order and keep any duplicates.

| stats list(letters) as word list(source) as source list(target) as target

Next, we want to process each word fragment in turn and determine if it can be bound to another word fragment. We do this by first creating a multivalue field with indexes for all the word fragments.

| eval binds=mvrange(0,mvcount(source))

Then, for each word fragment, we want to find another word fragment which has a target which is equal to the current source. This indicates that the two word fragments can be bound together. The next step is a little involved, so I will go through it step-by-step.

| foreach mode=multivalue binds
    [
    | eval joining=mvfind(target,mvindex(source,<<ITEM>>)),

Note the comma at the end of the line. This allows us to extend the eval command with more assignments without breaking the restriction of only one command in the foreach subsearch when in multivalue mode.

Create a joined word (fragment) from the found word fragment and the current word fragment (minus its initial letter).

joined=if(isnotnull(joining),mvjoin(mvappend(split(mvindex(word,joining),""),mvindex(split(mvindex(word,<<ITEM>>),""),1,-1)),""),null()),

Create a new word list, replacing the current word fragment with an empty string

new_word=if(isnotnull(joining),mvappend(if(<<ITEM>>=0,null(),mvindex(word,0,<<ITEM>>-1)),"",mvindex(word,<<ITEM>>+1,-1)),new_word),

Update the word list, replacing the found word fragment with the new word fragment

word=if(isnotnull(joining),mvappend(if(joining=0,null(),mvindex(new_word,0,joining-1)),joined,mvindex(new_word,joining+1,-1)),word),

Update the new source list, replacing the current source with an empty string

new_source=if(isnotnull(joining),mvappend(if(<<ITEM>>=0,null(),mvindex(source,0,<<ITEM>>-1)),"",mvindex(source,<<ITEM>>+1,-1)),source),
source=new_source,

Create a new target  list, replacing the found target with the current target

new_target=if(isnotnull(joining),mvappend(if(joining=0,null(),mvindex(target,0,joining-1)),mvindex(target,<<ITEM>>),mvindex(target,joining+1,-1)),target),

Update the target  list, replacing the current target with  an empty string

target=if(isnotnull(joining),mvappend(if(<<ITEM>>=0,null(),mvindex(new_target,0,<<ITEM>>-1)),"",mvindex(new_target,<<ITEM>>+1,-1)),new_target)
]

Remove the empty strings

| mvexpand word
| where word!=""
| table word

Job done?

Looking at the resulting list of words, something is not quite right. There are a couple of words which make sense (SPLUNK, BORE, XML), but others do not (OCSIC is CISCO spelt backwards!). So, what is going on?

Perhaps there are some clues in the diagram. If you save the XML to a file and open it in draw.io, you can drag the letter tiles around and follow the arrows to see what the words should be.

ITWhisperer_1-1769088354911.png

Taking a closer look

Looking at the XML more closely, you can see that some of the connection mxCells have slightly different style attributes to the others. For example, some have style attributes including:

startArrow=none;startFill=0;endArrow=classic;endFill=1;

Whereas others have style attributes including:

startArrow=classic;startFill=1;endArrow=none;endFill=0;

In the first instance, the style for the connection indicates that the arrowhead is closer to the target, and in the second instance, the style for the connection indicates that the arrowhead is closer to the source. This means that values for source and target should be switched in the second instance.

Another little nugget you may have noticed from a closer look at the style attributes (and indeed the draw.io diagram) is that some of the connections have different stroke and fill colours. These can actually be used to group the connections into word groups.

| stats list(letters) as word list(source) as source list(target) as target by strokeColor

I will leave you to work out how to extract and apply this information to your solution to get the correct list of words.

As a final “Easter egg”, you may or may not have noticed that the ids for the letter mxCell nodes contain the ASCII code for the lowercase version of the corresponding letter, e.g. “0cd5-73fa” contains “73” which is the ASCII code for lowercase “s”, and “2151-6ffe” contains “6f” which is the ASCII code for lowercase “o”. If you had worked this out, you could have perhaps avoided the non-lookup-self-lookup step!

Summary

The techniques outlined in this solution, non-lookup-self-lookups and index-based processing loops, can be applied in multiple usecases and may provide solutions to some more complex scenarios.

Have questions or thoughts? Comment on this article or in Slack #puzzles channel. Whichever you prefer.

Contributors
Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...