Fun with Regular Expression - multiples of nine
This challenge was first posted on Slack #regex channel https://splunkcommunity.slack.com/archives/C3WFE5V5G/p1759522087595439
For BORE at .conf24, we had a puzzle question which was to find integers which were multiples of 3. Rather than providing spoilers (in case we run BORE again and allow previous questions to be answered), I have devised another puzzle on similar lines. Find the integers which are multiple of 9, just using a single regular expression (rex command). Using the following SPL to generate some numbers, complete the regex to return matches (where the match field is equal to the number field) when modulus-9 is 0 (zero).
| makeresults count=20
| eval number=(9*(random()%1000000))+((random()%4)*(random()%4))
| eval modulus=number%9
| rex field=number "(?<match>)"
| table number modulus match
#justforfun
This article contains spoilers!
Some school-level mathematics
Starting with some school-level maths, we were taught that, for numbers that are multiples of nine (9), if you sum the digits of the number, and keep summing the digits of the result, you will eventually get a sum of nine. This is called the digital root. There is probably a mathematical proof for this but a simple examination of some examples will show this to appear to be true. For example:
Number
Divide by 9
Sum digits
18
2
9
36
4
9
45
5
9
240642
26738
18
7459506
828834
36
123456789
13717421
45
In more general terms, if you take the digital root of a number, extending the number by adding a 9 or a group of digits with a digit root of 9 anywhere in the sequence of digits, the resulting number has the same digit root as the original number. Similarly, if you remove a sequence of digits with a digital root of 9. The digits don’t even have to be inserted or removed sequentially! For example:
Number
Digital Root
Extended
New digital root
42
6 (4+2)
429
6 (4+2+9=15, 1+5=6)
7506
9 (7+5+0+6=18, 1+8=9)
7459506
9 (7+4+5+9+5+0+6=36, 3+6=9)
123456789
9 (see above)
13689
9 (1+3+6+8+9=27, 2+7=9)
Patterns in the numbers
So much for the maths; unfortunately, regular expressions don’t really do maths, they match patterns, so how can we solve this puzzle using patterns?
Starting with the simplest patterns for digital roots of 9, we have only two single digit numbers which are multiple of 9, 9 and 0 (zero). In regex terms, we can define a subroutine (called “nine”) to match to digital root 9:
(?’nine’9|0)
This can be extended to include the eight 2-digit numbers, 18, 27, 36, 45, 54, 63, 72 and 81.
(?’nine’9|0|18|27|36|45|54|63|72|81)
This can be demonstrated in a Splunk search like so:
| makeresults format=csv data="number
0
9
18
27
36
45
54
63
72
81"
| rex field=number "(?<match>(?'nine'9|0|18|27|36|45|54|63|72|81))"
| table number match
In regex101.com, this would look like this https://regex101.com/r/zJyXu7/1
As already discussed, inserting digits with a digit root of nine doesn’t affect the digital root; for example, 108, 297, 3456, 4572 and 8271 all have digital roots of nine. In regex terms, they all match to:
(?'nine'9|0|1(?&nine)*8|2(?&nine)*7|3(?&nine)*6|4(?&nine)*5|5(?&nine)*4|6(?&nine)*3|7(?&nine)*2|8(?&nine)*1)+
Note that word wrapping has to be removed when used in SPL (and regex101 https://regex101.com/r/zJyXu7/2)
Breaking this expression down, it means the following:
(?'nine'
Start a subroutine definition
9|
Alternative 9
0|
Alternative 0
1(?&nine)*8|
Alternative 1, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 8
2(?&nine)*7|
Alternative 2, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 7
3(?&nine)*6|
Alternative 3, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 6
4(?&nine)*5|
Alternative 4, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 5
5(?&nine)*4|
Alternative 5, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 4
6(?&nine)*3|
Alternative 6, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 3
7(?&nine)*2|
Alternative 7, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by 2
8(?&nine)*1
Alternative 8, followed by zero or more digital root 9 numbers (recursive call to ‘nine’), followed by1
)+
End of subroutine definition, with subroutine executed at least once
Cracked it? Well, not quite. Remember that the digits could be inserted anywhere, not necessarily together, so, for example, 12348765 is a multiple of 9 which does not match the current regular expression. https://regex101.com/r/zJyXu7/3
Generalised solution
In general terms, the 2-digit alternatives should be a literal digit followed by digits with the correct digital root. (Note that a literal digit is required first, so that at least one character is consumed by the subroutine, which avoids problems such as infinite recursion.) For example:
(?'nine'
9|
8(?&one)|
7(?&two)|
6(?&three)|
5(?&four)|
4(?&five)|
3(?&six)|
2(?&seven)|
1(?&eight)|
0
)+
Now, these other digital root subroutines need to be defined. In regex, subroutines can be called before they are defined, so long as they are defined somewhere in the regular expression. So, expanding just the first couple would look like this:
(?'nine'
9|
8(?'one'(?&nine)*(
8(?&two)|
7(?&three)|
6(?&four)|
5(?&five)|
4(?&six)|
3(?&seven)|
2(?&eight)|
1))|
7(?'two'(?&nine)*(
8(?&three)|
7(?&four)|
6(?&five)|
5(?&six)|
4(?&seven)|
3(?&eight)|
2|
1(?&one)))|
Note that each of the inner routines start with a recursive call to the outer ‘nine’ routine. I will leave the others for you to complete!
You should now have a regular expression which will match to (positive) integers which are multiples of 9. Try it out in Splunk as shown in the original problem definition.
Further challenge / stretch goals
How would you generalise the expression to handle negative integers, lists of integers, thousands separators, but ignore date and time formats?
... View more