Advent of Code In order to participate in these challenges, you will need to register with the Advent of Code site, so that you can retrieve your own datasets for the puzzles. When you get an answer, you will need to submit it to the Advent of Code site to determine whether the answer is correct. I have already completed the 2025 set using python, so I will know when my SPL generates the correct result. Day 4 Each day's puzzle is split into two parts; part one is usually easier than part two, and you cannot normally reach part two until you have successfully completed part one. Day 4 is about removing characters from a pattern depending on the surrounding locations. Please visit the website for full details of the puzzle. This article contains spoilers! In fact, the whole article is a spoiler as it contains solutions to the puzzle. If you are trying to solve the puzzle yourself and just want some pointers to get you started, stop reading when you have enough, and return if you get stuck again, or just want to compare your solution to mine! Part One As with all the Advent of Code puzzles, the description of the problem is aided by some example input; in this case, the input is grid showing where rolls of paper are placed. The aim of the puzzle is to determine, for your own dataset, how many rolls of paper can be removed in the first instance. Initialising your data The first thing to do is initialise your data. One way to do this is to save it to a csv file and use inputlookup to load it. Alternative, you could just use makeresults (as I have done here), and set a field to the data: | makeresults
| fields - _time
| eval _raw="..@@.@@@@.
@@@.@.@.@@
@@@@@.@.@@
@.@@@@..@.
@@.@@@@.@@
.@@@@@@@.@
.@.@.@.@@@
@.@@@.@@@@
.@@@@@@@@.
@.@.@@@.@." The next step is to break the data up into separate events, | rex max_match=0 "(?<row>\S+)"
| mvexpand row Interpreting the data Each position in the row represents either a roll of paper, designated by '@', or an empty space, designated by '.'. To determine whether the paper can be removed, the 8 surrounding positions have to be checked, so use streamstats to gather the rows above and below. | append
[| stats count as row
| eval row=null()
]
| streamstats current=f last(row) as real_row
| streamstats current=f last(real_row) as previous_row
| rename row as next_row
| rename real_row as row
| where isnotnull(row)
| table previous_row row next_row Note, adding an extra event with append and removing the first event (which does not have a prior row) keeps the number of events correct. At this point, it is very similar to the start of Gabriel's solution. Solving Part One Part One is looking for the number of rolls of paper that can be removed, i.e. those with fewer than four rolls in the eight adjacent positions (orthogonally and diagonally). Start by determining how many positions there are in each row, and initialise the total. | eval max_y=len(row)
| eval y=mvrange(1,max_y+1)
| eval total=0 For each position in the row, initialise the count. | foreach mode=multivalue y
[
| eval count=0, Note the use of multivalue mode with the foreach command - there is a restriction in this mode that only one command is allowed, hence the use of the eval command, which can have multiple field assignments, separated by commas. The field being used has been set up to hold an index into the strings for each of the rows under consideration. If there is a prior row, check the positions before, equal and after the current position. count=if(isnotnull(previous_row), if(substr(row,<<ITEM>>,1)="@",if(<<ITEM>>>1, if(substr(previous_row,<<ITEM>>-1,1)="@",count+1, count), count), count), count),
count=if(isnotnull(previous_row), if(substr(row,<<ITEM>>,1)="@",if(substr(previous_row,<<ITEM>>,1)="@",count+1, count), count), count),
count=if(isnotnull(previous_row), if(substr(row,<<ITEM>>,1)="@",if(<<ITEM>><max_y, if(substr(previous_row,<<ITEM>>+1,1)="@",count+1, count), count), count), count), Check the positions before, and after the current position on the current row. count=if(substr(row,<<ITEM>>,1)="@",if(<<ITEM>>>1, if(substr(row,<<ITEM>>-1,1)="@",count+1, count), count), count),
count=if(substr(row,<<ITEM>>,1)="@",if(<<ITEM>><max_y, if(substr(row,<<ITEM>>+1,1)="@",count+1, count), count), count), If there is a subsequent row, check the positions before, equal and after the current position. count=if(isnotnull(next_row), if(substr(row,<<ITEM>>,1)="@",if(<<ITEM>>>1, if(substr(next_row,<<ITEM>>-1,1)="@",count+1, count), count), count), count),
count=if(isnotnull(next_row), if(substr(row,<<ITEM>>,1)="@",if(substr(next_row,<<ITEM>>,1)="@",count+1, count), count), count),
count=if(isnotnull(next_row), if(substr(row,<<ITEM>>,1)="@",if(<<ITEM>><max_y, if(substr(next_row,<<ITEM>>+1,1)="@",count+1, count), count), count), count), If the count from all these positions is less than 4, this paper roll can be moved, so increment the total. total=if(substr(row,<<ITEM>>,1)="@" and count < 4, total+1, total)
] Now, simply sum the totals from all the rows. | stats sum(total) as total Part Two The second part of the puzzle requires that again we determine, for your own dataset, how many paper rolls can be removed, but, additionally, once a paper roll has been removed, this may free up other paper rolls, so that they too can be removed. We have to keep going until no more paper rolls can be removed. Solving Part Two One approach to this is to copy and paste the code from Part One, multiple times, adjusting the grid representing where the remaining rolls of paper are until no more paper rolls have been removed. But how many times would you need to do this? For the test data, there would be 9 iterations of the code, but who knows how many times your own dataset would require. Not only would this be a laborious process, your search might breach some limits e.g. time or memory, and/or seriously compromise your Splunk instance. An alternative to this is to reconstruct the search so that it can be iterated using a foreach command. A consequence of this is that only streaming commands can be used. Iterating with foreach command Can we just apply a foreach iteration loop around the Part One solution? If we try something like this: | foreach 1 2 3 4 5 6 7 8 9
[
``` create previous and next rows for current rows ```
... We hit an immediate problem because streamstats is not allowed in a foreach subsearch. OK, can we create the previous and next rows before the foreach iteration loop? Something like this: ``` create previous and next rows for current row ```
| foreach 1 2 3 4 5 6 7 8 9
[
``` count neighbours for each roll in current row ```
``` remove roll if neighbours less than 4 ```
... This looks a little more promising, except what happens if we remove a roll from row 1. This is simple enough to do for the current row (row 1), but row 1 is also the previous row for row 2, and, as it stands, we have no way to change this as it is in a different event. This is the crux of the problem with using foreach to iterate over the solution to Part One. We need to go back to treating the problem as a single event. Start from the beginning The first thing to do is create a multi-value field with one value for each row of the grid. (If you have read your data in from a csv file, you may need to use a stats command with list() aggregate function to gather the rows of the grid into a multi-value field.) | rex max_match=0 "(?<rows>\S+)"
| fields - _raw Now, we need a way to reference each row and column of the grid. Starting with the rows, in order to iterate over the rows, we can use field names. We could just count the rows and set up a hard-coded iteration over an appropriate number of field names (much like shown earlier for iterating the Part One solution). However, it would be better to dynamically create the appropriate number of fields, so we can iterate over them. So, start by determining the maximum value in this direction (I have called this the x-direction). | eval max_x=mvcount(rows) Now, using mvrange(), we can create a multi-value field with enough entries for each of the rows. | eval list=mvrange(0,max_x) We can now use this field in the by clause of a chart command to create the required fields. | chart values(eval(0)) as zero by list OK, not quite what we wanted, because the list has to be the second field in the by clause to be used to generate field names. | eval name="day_4"
| chart values(eval(0)) as zero by name list useother=f limit=0 An alternative syntax for the same outcome could look like this: | eval name="day_4"
| chart values(eval(0)) as zero over name by list useother=f limit=0 Note that over has the first field listed in the previous by clause and the by clause now just has one field listed. We have a set of fields names which we could use to iterate over the rows of data, but we no longer have the data. A simple way to get the field names without losing the data entirely is to use appendpipe to generate the field names in a subsearch. | eval max_x=mvcount(rows)
| appendpipe
[
| eval list=mvrange(0,max_x)
| eval name="day_4"
| chart values(eval(0)) as zero by name list useother=f limit=0
| fields - name
]
| where isnotnull(max_x) Remember to drop the name field (it is only required for the chart command) and drop the appended event (the generated fields still exist even when they have no data in them). Now, we can list them in a foreach command. Since they are all just numbers this is not so easy, but we can solve this by tweaking the appendpipe subsearch to prepend the number with a known value (I have used 'f' in this instance). | appendpipe
[
| eval list=mvrange(0,max_x)
| mvexpand list
| eval list="f".list
| eval name="day_4"
| chart values(eval(0)) as zero by name list useother=f limit=0
| fields - name
]
| where isnotnull(max_x) Now we can use a wildcard on the foreach command to list all the relevant fields | foreach f* We can do a similar thing for the fields to iterate over the columns, but since there are the same number of columns as rows, we could reuse the same set of field names. | eval max_y=len(mvindex(rows,0)) Using the list of fields, we can process each row, finding the previous and next rows | foreach f*
[
| eval x=substr("<<FIELD>>",2,len("<<FIELD>>")-1)
| eval row=mvindex(rows,x)
| eval previous_row=if(x > 0, mvindex(rows, x - 1), null())
| eval newt_row=if(x < max_x-1, mvindex(rows, x + 1), null())
] Now, we can try to iterate over the columns in the rows using the same set of fields. Let us start by converting each character to either a zero (0) if it is a dot (.), or a one (1) if it is a roll of paper (@). | foreach f*
[
| eval x=substr("<<FIELD>>",2,len("<<FIELD>>")-1)
| eval row=mvindex(rows,x)
| eval previous_row=if(x > 0, mvindex(rows, x - 1), null())
| eval newt_row=if(x < max_x-1, mvindex(rows, x + 1), null())
| eval new_row=""
| foreach f*
[
| eval y=substr("<<FIELD>>",2,len("<<FIELD>>")-1)
| eval char=substr(row,y+1,1)
| eval new_row=new_row.if(char=".","0","1")
]
| eval new_rows=mvappend(new_rows,new_row)
] That did not turn out as intended 0000000000
1111111111
1111111111
1111111111
1111111111
1111111111
0000000000
1111111111
1111111111
0000000000 Each (new) row is made up of the same number depending on what the n-th character of the original row was, i.e. the first character of the first row, the second character of the second row, etc.. This shows that in the inner foreach, <<FIELD>> is not being changed for each field in the list. One way to get around this, is to determine the index into the row independently of the field name. | foreach f*
[
| eval x=substr("<<FIELD>>",2,len("<<FIELD>>")-1)
| eval row=mvindex(rows,x)
| eval previous_row=if(x > 0, mvindex(rows, x - 1), null())
| eval newt_row=if(x < max_x-1, mvindex(rows, x + 1), null())
| eval new_row=""
| eval y=0
| foreach f*
[
| eval y=y+1
| eval char=substr(row,y,1)
| eval new_row=new_row.if(char=".","0","1")
]
| eval new_rows=mvappend(new_rows,new_row)
] We can use the same approach for the row index (x) so that, when we come to iterating multiple times, we are not dependent on the field name in <<FIELD>> | eval x=0
| foreach f*
[
| eval row=mvindex(rows,x)
| eval previous_row=if(x > 0, mvindex(rows, x - 1), null())
| eval newt_row=if(x < max_x-1, mvindex(rows, x + 1), null())
| eval new_row=""
| eval y=0
| foreach f*
[
| eval y=y+1
| eval char=substr(row,y,1)
| eval new_row=new_row.if(char=".","0","1")
]
| eval new_rows=mvappend(new_rows,new_row)
| eval x=x+1
] Now we can iterate over the set of rows multiple times until no more rolls can be removed | eval total=0
| foreach 1 2 3 4 5 6 7 8 9
[
| eval rows=new_rows
| eval new_rows=null()
| eval x=0
| foreach f*
[
| eval row=mvindex(rows,x)
| eval previous_row=if(x > 0, mvindex(rows, x - 1), null())
| eval next_row=if(x < max_x-1, mvindex(rows, x + 1), null())
| eval new_row=""
| eval y=0
| foreach f*
[
| eval y=y+1
| eval char=substr(row,y,1)
| eval count=if(char="1",if(isnotnull(previous_row) and y > 1, tonumber(substr(previous_row,y - 1, 1)), 0), 0)
| eval count=if(char="1",if(isnotnull(previous_row), count + tonumber(substr(previous_row,y, 1)), count), count)
| eval count=if(char="1",if(isnotnull(previous_row) and y < max_y, count + tonumber(substr(previous_row,y + 1, 1)), count), count)
| eval count=if(char="1",if(y > 1, count + tonumber(substr(row,y - 1, 1)), count), count)
| eval count=if(char="1",if(y < max_y, count + tonumber(substr(row,y + 1, 1)), count), count)
| eval count=if(char="1",if(isnotnull(next_row) and y > 1, count + tonumber(substr(next_row,y - 1, 1)), count), count)
| eval count=if(char="1",if(isnotnull(next_row), count + tonumber(substr(next_row,y, 1)), count), count)
| eval count=if(char="1",if(isnotnull(next_row) and y < max_y, count + tonumber(substr(next_row,y + 1, 1)), count), count)
| eval total=if(char="1" and count < 4, total + 1, total)
| eval new_row=new_row.if(char="1" and count < 4,"0", char)
]
| eval new_rows=mvappend(new_rows,new_row)
| eval x=x+1
]
] Warning! Approach with caution! This approach works for the test data, but with the much larger dataset, if yours is anything like mine, even doubling up to 18 iterations breaks Splunk. Do not try this yourself! | eval total=0
| foreach 1 2
[
| foreach 1 2 3 4 5 6 7 8 9
[ Finding a better way If you had run the process on your dataset for just a few times, e.g. the 9 times in the foreach, you may have noticed that the number of paper rolls left rapidly diminishes at the start, so we are left with an increasingly sparsely populated grid. This gives us a clue to a different approach. Rather than keeping the whole grid throughout the process, we only need to keep track of where the remaining paper rolls are. The coordinates of where a paper roll is can be represented by the row and column. Given that the number of columns is less than a thousand (1000), we can multiply the row number by 1000 and add the column number to give a value that can easily be interpreted both visually and mathematically. For example, row 1 has a paper roll in column 3, therefore this can be represented by 1003, and row 10 has a paper roll in the first column, therefore this can be represented by 10001, etc. Using this process, we can build a string of all the coordinates of the paper rolls. | rex max_match=0 "(?<rows>\S+)"
| fields - _raw
| mvexpand rows
| eval columns=mvrange(1,len(rows)+1)
| streamstats count as row
| eval paper=""
| foreach mode=multivalue columns
[
| eval paper=paper.if(substr(rows,<<ITEM>>,1)=="@",",".(row*1000+<<ITEM>>),"")
]
| stats values(paper) as paper max(row) as max_x values(eval(len(rows))) as max_y
| eval paper=mvjoin(paper,"")."," Now, to check the surrounding locations, we can generate the coordinates of the neighbouring positions and check whether they exist in the list. | eval paper_list=split(trim(paper,","),",")
| foreach mode=multivalue paper_list
[
| eval count=0,
x=floor(<<ITEM>>/1000),
y=<<ITEM>>%1000,
other_x=if(x > 1, x-1, null()),
other_y=if(isnotnull(other_x), if(y > 1, y-1, null()), null()),
count=if(isnotnull(other_x) and isnotnull(other_y) and match(paper, ",".(other_x*1000+other_y).","), count+1, count),
other_x=if(x > 1, x-1, null()),
other_y=if(isnotnull(other_x), y, null()),
count=if(isnotnull(other_x) and isnotnull(other_y) and match(paper, ",".(other_x*1000+other_y).","), count+1, count), I have just shown how to check a couple of locations, I will leave the rest for you to complete. Note that I have included the delimiting commas (,) on either side of the coordinate in the match to ensure false positives are not made. Having counted the neighbours, we can either increase the total or add the location to a new list ready for the next iteration. | eval total=0
| eval paper_list=split(trim(paper,","),",")
| eval new_paper=","
| foreach mode=multivalue paper_list
[
| eval count=0,
x=floor(<<ITEM>>/1000),
y=<<ITEM>>%1000,
other_x=if(x > 1, x-1, null()),
other_y=if(isnotnull(other_x), if(y > 1, y-1, null()), null()),
count=if(isnotnull(other_x) and isnotnull(other_y) and match(paper, ",".(other_x*1000+other_y).","), count+1, count),
other_x=if(x > 1, x-1, null()),
other_y=if(isnotnull(other_x), y, null()),
count=if(isnotnull(other_x) and isnotnull(other_y) and match(paper, ",".(other_x*1000+other_y).","), count+1, count),
```...```
new_paper=if(count < 4, new_paper, new_paper.<<ITEM>>.","),
total=if(count < 4, total+1, total)
]
| eval paper=new_paper Good enough? Putting this new approach into nested foreach commands, can we iterate enough times to get an answer without breaking Splunk? For my data and environment, this still caused problems. So, how else could the process be optimised? If we consider the end of the process, we will have a number of paper rolls which cannot be removed; the only thing that would change this situation is if one of the rolls could be removed and by removing this roll we can only be potentially impacting the neighbouring locations. This means that, rather than processing a list of all the locations of paper rolls, we only need to process the list of locations which are neighbours to a paper roll that has just been removed in the previous iteration. (We still need to keep a list of all the current locations in order for us to be able to check (and count) them.) Now, the process involves keeping track of the locations where rolls have been removed, which locations were included in the count if the roll was removed, compiling a list of considered locations, then filtering out any location where a roll had been removed. | eval total=0
| eval iterations=0
| eval original_paper = paper
| eval paper_list=split(trim(paper,","),",")
| eval new_paper=","
| eval removed_paper=","
| foreach mode=multivalue paper_list
[
| eval count=0,
x=floor(<<ITEM>>/1000),
y=<<ITEM>>%1000,
considered_paper="",
other_x=if(x > 1, x-1, null()),
other_y=if(isnotnull(other_x), if(y > 1, y-1, null()), null()),
count=if(isnotnull(other_x) and isnotnull(other_y) and match(original_paper, ",".(other_x*1000+other_y).","), count+1, count),
considered_paper=if(isnotnull(other_x) and isnotnull(other_y) and match(original_paper, ",".(other_x*1000+other_y).","), considered_paper.(other_x*1000+other_y).",", considered_paper),
other_x=if(x > 1, x-1, null()),
other_y=if(isnotnull(other_x), y, null()),
count=if(isnotnull(other_x) and isnotnull(other_y) and match(original_paper, ",".(other_x*1000+other_y).","), count+1, count),
considered_paper=if(isnotnull(other_x) and isnotnull(other_y) and match(original_paper, ",".(other_x*1000+other_y).","), considered_paper.(other_x*1000+other_y).",", considered_paper),
```...```
next_paper=if(count < 4, new_paper.considered_paper, new_paper),
new_paper=if(count < 4 and len(next_paper) > 1, ",".mvjoin(mvdedup(split(trim(next_paper, ","), ",")), ",").",", next_paper),
removed_paper=if(count < 4, removed_paper.<<ITEM>>.",", removed_paper),
total=if(count < 4, total+1, total)
]
| eval paper_list=split(trim(new_paper, ","), ",")
| eval paper=","
| foreach mode=multivalue paper_list
[
| eval paper=if(match(removed_paper, ",".<<ITEM>>.","), paper, paper.<<ITEM>>.",")
]
| eval paper_list=split(trim(original_paper, ","), ",")
| eval original_paper=","
| foreach mode=multivalue paper_list
[
| eval original_paper=if(match(removed_paper, ",".<<ITEM>>.","), original_paper, original_paper.<<ITEM>>.",")
] Putting this re-optimised approach into nested foreach commands, allowed me to iterate enough times to get an answer without breaking Splunk. It is worth noting that using strings to maintain the list rather than a multi-value field, also helped optimise the process. Summary With multiple iterations, searches with large field values and/or large multi-value fields, can become Splunk-killers; in order to avoid compromising the Splunk server instance, searches may need to be optimised to reduce these impacts, even if the resulting search looks more complicated. Have questions or thoughts? Comment on this article or in Slack #puzzles channel. Whichever you prefer.
... View more