This challenge was first posted on Slack #regex channel https://splunkcommunity.slack.com/archives/C3WFE5V5G/p1759522087595439
For BORE at .conf24, we had a puzzle question which was to find integers which were multiples of 3. Rather than providing spoilers (in case we run BORE again and allow previous questions to be answered), I have devised another puzzle on similar lines. Find the integers which are multiple of 9, just using a single regular expression (rex command). Using the following SPL to generate some numbers, complete the regex to return matches (where the match field is equal to the number field) when modulus-9 is 0 (zero).
| makeresults count=20
| eval number=(9*(random()%1000000))+((random()%4)*(random()%4))
| eval modulus=number%9
| rex field=number "(?<match>)"
| table number modulus match
#justforfun
This article contains spoilers!
Starting with some school-level maths, we were taught that, for numbers that are multiples of nine (9), if you sum the digits of the number, and keep summing the digits of the result, you will eventually get a sum of nine. This is called the digital root. There is probably a mathematical proof for this but a simple examination of some examples will show this to appear to be true. For example:
Number | Divide by 9 | Sum digits |
18 | 2 | 9 |
36 | 4 | 9 |
45 | 5 | 9 |
240642 | 26738 | 18 |
7459506 | 828834 | 36 |
123456789 | 13717421 | 45 |
In more general terms, if you take the digital root of a number, extending the number by adding a 9 or a group of digits with a digit root of 9 anywhere in the sequence of digits, the resulting number has the same digit root as the original number. Similarly, if you remove a sequence of digits with a digital root of 9. The digits don’t even have to be inserted or removed sequentially! For example:
Number | Digital Root | Extended | New digital root |
42 | 6 (4+2) | 429 | 6 (4+2+9=15, 1+5=6) |
7506 | 9 (7+5+0+6=18, 1+8=0) | 7459506 | 9 (7+4+5+9+5+0+6=36, 3+6=9) |
123456789 | 9 (see above) | 13689 | 9 (1+3+6+8+9=27, 2+7=9) |
So much for the maths; unfortunately, regular expressions don’t really do maths, they match patterns, so how can we solve this puzzle using patterns?
Starting with the simplest patterns for digital roots of 9, we have only two single digit numbers which are multiple of 9, 9 and 0 (zero). In regex terms, we can define a subroutine (called “nine”) to match to digital root 9:
(?’nine’9|0)
This can be extended to include the eight 2-digit numbers, 18, 27, 36, 45, 54, 63, 72 and 81.
(?’nine’9|0|18|27|36|45|54|63|72|81)
This can be demonstrated in a Splunk search like so:
| makeresults format=csv data="number
0
9
18
27
36
45
54
63
72
81"
| rex field=number "(?<match>(?'nine'9|0|18|27|36|45|54|63|72|81))"
| table number match
In regex101.com, this would look like this https://regex101.com/r/zJyXu7/1
As already discussed, inserting digits with a digit root of nine doesn’t affect the digital root; for example, 108, 297, 3456, 4572 and 8271 all have digital roots of nine. In regex terms, they all match to:
(?'nine'9|0|1(?&nine)*8|2(?&nine)*7|3(?&nine)*6|4(?&nine)*5|5(?&nine)*4|6(?&nine)*3|7(?&nine)*2|8(?&nine)*1)+
Note that word wrapping has to be removed when used in SPL (and regex101 https://regex101.com/r/zJyXu7/2)
Breaking this expression down, it means the following:
(?'nine' | Start a subroutine definition |
9| | Alternative 9 |
0| | Alternative 0 |
1(?&nine)*8| | Alternative 1, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 8 |
2(?&nine)*7| | Alternative 2, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 7 |
3(?&nine)*6| | Alternative 3, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 6 |
4(?&nine)*5| | Alternative 4, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 5 |
5(?&nine)*4| | Alternative 5, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 4 |
6(?&nine)*3| | Alternative 6, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 3 |
7(?&nine)*2| | Alternative 7, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 2 |
8(?&nine)*1 | Alternative 8, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by1 |
)+ | End of subroutine definition, with subroutine executed at least once |
Cracked it? Well, not quite. Remember that the digits could be inserted anywhere, not necessarily together, so, for example, 12348765 is a multiple of 9 which does not match the current regular expression. https://regex101.com/r/zJyXu7/3
In general terms, the 2-digit alternatives should be a literal digit followed by digits with the correct digital root. (Note that a literal digit is required first, so that at least one character is consumed by the subroutine, which avoids problems such as infinite recursion.) For example:
(?'nine'
9|
8(?&one)|
7(?&two)|
6(?&three)|
5(?&four)|
4(?&five)|
3(?&six)|
2(?&seven)|
1(?&eight)|
0
)+
Now, these other digital root subroutines need to be defined. In regex, subroutines can be called before they are defined, so long as they are defined somewhere in the regular expression. So, expanding just the first couple would look like this:
(?'nine'
9|
8(?'one'(?&nine)*(
8(?&two)|
7(?&three)|
6(?&four)|
5(?&five)|
4(?&six)|
3(?&seven)|
2(?&eight)|
1))|
7(?'two'(?&nine)*(
8(?&three)|
7(?&four)|
6(?&five)|
5(?&six)|
4(?&seven)|
3(?&eight)|
2|
1(?&one)))|
Note that each of the inner routines start with a recursive call to the outer ‘nine’ routine. I will leave the others for you to complete!
You should now have a regular expression which will match to (positive) integers which are multiples of 9. Try it out in Splunk as shown in the original problem definition.
How would you generalise the expression to handle negative integers, lists of integers, thousands separators, but ignore date and time formats?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.