Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

[Puzzles] Solve, Learn, Repeat: Advent of Code - Day 2

ITWhisperer
SplunkTrust
SplunkTrust

Advent of Code

In order to participate in these challenges, you will need to register with the Advent of Code site, so that you can retrieve your own datasets for the puzzles. When you get an answer, you will need to submit it to the Advent of Code site to determine whether the answer is correct. I have already completed the 2025 set using python, so I will know when my SPL generates the correct result.

Day 2

Each day's puzzle is split into two parts; part one is usually easier than part two, and you cannot normally reach part two until you have successfully completed part one. Day 2 is about finding numbers with repeating sequences of digits. Please visit the website for full details of the puzzle.

This article contains spoilers!

In fact, the whole article is a spoiler as it contains solutions to the puzzle. If you are trying to solve the puzzle yourself and just want some pointers to get you started, stop reading when you have enough, and return if you get stuck again, or just want to compare your solution to mine!

Part One

As with all the Advent of Code puzzles, the description of the problem is aided by some example input; in this case, the input is a series of number ranges which represent product ids. The aim of the puzzle is to determine, for your own dataset, the sum of the invalid ids that appear in the number ranges. For part one, an invalid id is a number that, when split two parts, both parts are made up of same sequence of digits.

Initialising your data

The first thing to do is initialise your data. One way to do this is to save it to a csv file and use inputlookup to load it. Alternative, you could just use makeresults (as I have done here), and set a field to the data:

| makeresults
| fields - _time
| eval _raw="11-22,95-115,998-1012,1188511880-1188511890,222220-222224,1698522-1698528,446443-446449,38593856-38593862,565653-565659,824824821-824824827,2121212118-2121212124"

The next step is to break the data up into separate events,

| rex max_match=0 "(?<range>\d+\-\d+)"
| mvexpand range

Interpreting the data

Using our good friend rex, parse the data to get the start and end of the range.

| rex field=range "(?<start>\d+)\-(?<end>\d+)"
| table start end

Solving Part One

Rather than checking each number in each range to see if it is invalid, my basic approach was to approach from the other side. That is, find a number that is made up from a repeated sequence of digits and check if it in the required range. From the problem description, we know that the range must include numbers with an even number of digits, i.e. either the start has an even number of digits, or the end has an even number of digits, or both, or that the difference in the number of digits is greater than 1 e.g. 1-101 while start and end have an odd number of digits, the range includes numbers with an even number of digits (10-99).

| eval start_len=len(start)
| eval end_len=len(end)
| where start_len%2 = 0 or end_len%2 = 0 or abs(start_len-end_len)>1

Having removed ranges which have no possibility of invalid id, we can reset the start and end of the remaining ranges. There are three possibilities:

  1. The number of digits are different and the start has an even number of digits
  2. The number of digits are different and the start has an odd number of digits
  3. The number of digits are the same
| eval case=case(start_len != end_len and start_len%2 == 0, 1, start_len != end_len and start_len%2 == 1, 2, true(), 0)

For the first case, we can reset the end of the range to be largest number with the same number of digits as the start.

| eval end=if(case = 1, pow(10, start_len) - 1, end)

For the second case, we can reset the start of the range to be lowest number with the same number of digits as the end. In addition to this, since we know this will have an even number of digits, we can construct the first invalid id in this range.

| eval start=if(case = 2, pow(10, start_len) + pow(10, floor(start_len / 2)), start)

Recalculate the length of the start and end of each range.

| eval start_len=len(start)
| eval end_len=len(end)

Determine the first half of the digits of start and end. This is done by dividing the number by 10 to the power of half the length e.g. if the number is a four digit number (between 1000 and 9999), dividing by 100 (10 to the power of 2) will give the first two digits.

| eval half_start=floor(start/pow(10,start_len/2))
| eval half_end=floor(end/pow(10,end_len/2))

From this, we can generate a range of possible first halves of the invalid ids.

| eval range=mvrange(half_start, half_end+1)
| mvexpand range

For each possible first half, generate a possible id and validate it is still in the original range.

| eval possible=range.range
| where possible<=end and possible>=start

 Now, simply add all the invalid ids.

| stats sum(possible) as invalid

Part Two

The second part of the puzzle requires that again we determine, for your own dataset, the sum of the invalid ids that appear in the number ranges, only this time the definition of what an invalid id has changed. Instead of it being a number which is made up of a duplicated sequence of digits, the digits could be repeated at least twice.

Start the same way

Using the same approach as for Part One, this time do not filter out the ranges without an even number of digits.

| rex max_match=0 "(?<range>\d+\-\d+)"
| mvexpand range
| rex field=range "(?<start>\d+)\-(?<end>\d+)"
| table start end
| eval start_len=len(start)
| eval end_len=len(end)

Now we can determine the range of the number of digits in the number ranges.

| eval length=mvrange(start_len, end_len+1)
| mvexpand length

For each length,  we want a list of possible factors. The simplest way to do this is to get a list from 1 to half the length (anything above half will not be a factor except the length itself), and eliminate any that do not create a zero modulus.

| eval log_range=mvrange(1,floor(length/2)+1)
| mvexpand log_range
| where length%log_range == 0

 Reset the start and end of the range for each possible length in a similar manner to that used in part one.

| eval start=if(start_len<length,pow(10, end_len-1),start) 
| eval end=if(end_len>length,pow(10, start_len) - 1,end)

Using the length of the repeating part of the number, find the range of possible start and end numbers 

| eval possible_start=substr(start,1,log_range)
| eval possible_end=substr(end,1,log_range)
| eval possibles=mvrange(possible_start, possible_end+1)
| mvexpand possibles

Determine the number of times the digits are repeated and construct a possible invalid id,  and validate it is still in the original range.

| eval reps=floor(length/log_range)
| eval rep_range=mvrange(0,reps)
| eval new_possible=""
| foreach mode=multivalue rep_range
    [| eval new_possible=new_possible.possibles]
| where new_possible<=end and new_possible>=start

Now we can simply sum the invalid ids.

| stats sum(new_possible) as invalid

Job done?

Well, not quite. It turns out that, the total comes out as being too high. So, what is going on? Which ones are we double-counting?

Suppose that the original range is 3428 to 4981. Within this range is 4444. Now, 4444 can be generated as four 4's or two 44's, hence the double-counting.

To eliminate these duplicates, we can simple gather the values of the possibilities for each number range (eliminating duplicates), before summing the possible invalid ids.

| stats values(new_possible) as possible by start end
| stats sum(possible) as invalid

Summary

By inverting the logic, we can build ranges of possible invalid ids before checking whether they are in range, thereby greatly reducing the number of ids we need to check.

Have questions or thoughts? Comment on this article or in Slack #puzzles channel. Whichever you prefer.

 

If you’re not subscribed, you’re probably missing something good. Fix that! 

Contributors
Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...