The largest number representable in 64 bits

17 points by gerikson 10 hours ago on lobsters | 1 comment

This post is a rewrite of my earlier blog post from Nov 2023 with many new insights and updates.

┬─┬ ┬─┬──────────                                  ┬─┬─┬ ┬─┬────────────
└─┤ │ │ ──┬──────                                  └─┤ │ │ │ ──┬────────
  │ │ │ ┬─┼──────                                    └─┤ │ │ ──┼─┬──────
  │ │ │ └─┤ ┬─┬──                                      │ │ │ ┬─┼─┼──────
  │ │ │   │ ┼─┼─┬                                      │ │ │ └─┤ │ ┬─┬──
  │ │ │   │ │ ├─┘                                      │ │ │   └─┤ ┼─┼─┬
  │ │ │   │ ├─┘                                        │ │ │     │ │ ├─┘
  │ │ │   ├─┘                                          │ │ │     │ ├─┘  
  │ │ ├───┘                                            │ │ │     ├─┘    
  │ ├─┘                                                │ │ ├─────┘      
  └─┘                                                  │ ├─┘            
                                                       └─┘              

Most people believe 264-1 = 18446744073709551615, or 0xFFFFFFFFFFFFFFFF in hexadecimal, to be the largest number representable in 64 bits. In English, it’s quite the mouthful: eighteen quintillion four hundred forty-six quadrillion seven hundred forty-four trillion seventy-three billion seven hundred nine million five hundred fifty-one thousand six hundred fifteen.

That is indeed the maximum possible value of 64 bit unsigned integers, available as data type uint64_t in C or u64 in Rust. Floating point data types can represent much larger values, courtesy of their base 2 exponent. The 64-bit double floating point format has a largest (finite) representable value of 21024(1-2-53) ~ 1.8*10308.

What if we allow representations beyond plain data types? Since we want representations to remain computable, the most general kind of representation would be a program in some programming language. But the program must be small enough to fit in 64 bits.

The largest number programmable in 64 bits

The smallest possible valid C program is “main(){}”, consisting of 8 ASCII characters. ASCII is a 7-bit character encoding standard representing 128 unique characters, but all modern computers use 8-bit bytes to store either plain ASCII or UTF-8, a Unicode character encoding that’s backward compatible with ASCII. So we’ll consider the above all-scaffold do-nothing program to be the only valid 64-bit C program.

Plenty other languages require no such scaffolding. For instance, Linux features the arbitrary precision calculator bc. It happily computes the 954242 digit number 9^999999 = 35908462…48888889, making it programmable in 8 bytes. So is the much larger 9^9^9^99 = 9^(9^(9^99)) with over 10^10^953 digits, which bc is less happy to compute. If bc supported the symbol ! for computing factorials, then 9!!!!!!! would represent a much larger number still.

Allowing such primitives feels a bit like cheating though. Would we allow a language that has the Ackerman function predefined, letting the 8 byte expression ack(9,9) represent a truly huge number?

Ackerman considered unhelpful

As it turns out, the question is moot. One can blow way past ack(9,9) in under 64 bits in a language with no built in primitive whatsoever. A language with no basic arithmetic; not even numbers themselves. A language in which all those must be defined from scratch.

But let’s first look at another primitives-lacking language, one that has been particularly well studied for producing largest possible outputs. That is the language of Turing machines.

Busy Beaver

The famous Busy Beaver function, introduced by Tibor Radó in 1962, which we’ll denote BB(n), is defined as the maximal number of steps taken by an n-state Turing Machine (TM) with a binary tape alphabet, starting from an all 0 tape, before halting.

Here we have a discrepancy between how the size of a TM is measured, in states, versus how program size is measured, in bits. Fortunately there is a straightforward binary encoding of n-state TMs, which is entirely determined by its transition function. For each of the n states that the machine’s finite control can be in, and each of its 2 tape symbols that could be scanned by its tape head, the transition function specifies what new symbol to write in the scanned tape cell (1 bit), whether to move the tape head left or right (1 bit), and what new state (or special halt state) to transition to (⌈log2(n+1)⌉ bits). This encoding takes 6*2*(2+3) = 60 bits for a a 6-state TM, and 7*2*(2+3) = 70 bits for a a 7-state TM.

We’re also stretching the meaning of “representable” a bit, since BB considers the runtime of the machine instead of its output. Besides the above BB (that Radó called S), Radó did define another function called Σ that considers the output of the machine as a number in unary, namely the number of 1s in the final tape contents. But BB has received more attention as it allows one to determine from BB(n) all halting n-state machines. For 6-states and up though, there is no discernable difference in magnitude between the two functions so we could have just as easily used Σ.

So the largest number TM programmable in 64 bits is BB(6).

How large is BB(6)?

Unfortunately, we may never know. While all BB(n) have been determined (and even formally proven) for n≤5, there are some 6-state TMs whose halting behaviour are closely related to very hard mathematical problems. Most of these so-called cryptids are likely not to halt, with some, like Lucy’s Moonlight, likely to halt but unlikely to beat the current champion. The current 6-state champion shows that BB(6) > 2↑↑2↑↑2↑↑10 . Here, m↑↑n is Knuth’s up-arrow notation for an exponential tower of n m’s, so that for example 2↑↑3 = 222. Large as this number is, it’s still very small compared to ack(9,9) = 2↑712 - 3 = 2↑↑↑↑↑↑↑12 - 3.

It is known however that BB(7) > 2↑112↑113 > ack(9,9) . Several leading BB researchers believe that BB(7) is even larger than the famous Graham’s Number, which iterates the function mapping n to 3↑n3 64 times starting from n=3. This appears to me a rather bold belief, considering that the smallest known Graham exceeding TM has 14 states, twice as many. So I offered a $1000 bet that a proof of BB(7) > Graham’s Number won’t be found within 10 years, which BB researcher Shawn Ligocki was happy to accept.

Meanwhile, Graham’s Number is easily surpassed within 64 bits, by moving beyond Turing machines into the language of

Lambda Calculus

Alonzo Church conceived the λ-calculus in about 1928 as a formal logic system for expressing computation based on function abstraction and application using variable binding and substitution.

The Graham beating lambda term originates in a Code Golf challenge asking for the “Shortest terminating program whose output size exceeds Graham’s number”, answered by user Patcail and further optimized by user 2014MELO03. The following 49 bit program

01 00 01 10 10 00 01 10 01 10 00 00 01 01 10 110 00 00 01 110 01 110 10

is the Binary Lambda Calculus encoding of the term

(λ 1 1) (λ 1 (1 (λ λ 1 2 (λ λ 2 (2 1)))))

where λ (lambda) denotes an anonymous function, and number i is the variable bound by the i-th nested λ. This is known as De Bruijn notation, a way to avoid naming variables. A more conventional notation using variable names would be

(λJ.J J) (λy.y (y (λg λm. m g (λf.λx.f (f x)))))

The top left of this post shows a graphical representation of the term. The last 16 bits of the program, making up almost a third of its size, encodes the term λf λx. f (f x), which takes arguments f and x in turn, and iterates f twice on x. In its generalized form, the function λf λx. fn x, called Church numeral n, is the most common way of representing numbers in the λ-calculus. The encoding of Church numeral n is 0000(01110)n10, of size 5n+6 bits.

The program, which we’ll name after its discoverer, can be expressed more legibly as

Melo = let { 2 = λf λx. f (f x); H = λg λm. m g 2; J = λy. y (y H) } in J J

Melo evaluates to a Church numeral, “Melo’s Number”, that comfortably exceeds Graham’s Number.

Proof of exceeding Graham’s Number

Lemma 1. J J = 2↑↑6 HH 2, where HH denotes H H

Proof:

J J = J (J H) = J (H HH) = H HH (H HH H) = H HH H HH 2 = H HH 2 HH 2 = 2 HH 2 HH 2 = HH (HH 2) HH 2 = HH 2 H 2 HH 2 = 2 H 2 H 2 HH 2 = H (H 2) H 2 HH 2 = H (H 2) 2 2 HH 2 = 2 (H 2) 2 2 HH 2 = H 2 (H 2 2) 2 HH 2 = H 2 2 2 2 2 HH 2 = 2 2 2 2 2 2 HH 2 = 2↑↑6 HH 2

Lemma 2. For k,n ≥ 2, k H 2 n > 3↑k(1+n)

Proof:

By induction on k. First note that H2 n = H 2 n = n 2 2 = 2^2^n

Base: 2 H 2 n = H H2 n = n H2 2 = 2↑↑(1+2n) > 3↑2(1+n) already at n=2, since 2↑↑5 = 2^2^16 > 3^27 = 3↑↑3 Step: k+1 H 2 n = H (k H 2) n = n (k H 2) 2 > 3↑k(1+ 3↑k(1+ … 3↑k(1+2)…)) > 3↑k+1(1+n)

Lemma 3. For n ≥ 2, HH (HH n) > 3↑n3

Proof

By induction on n

Base: Lemma 1’s proof shows HH (HH 2) = 2↑↑6 > 3{2</sup>3 Step: HH (HH 1+n) = HH 1+n H 2 = 1+n H 2 H 2 = H (n H 2) H 2 = H (n H 2) 2 2 = 2 (n H 2) 2 2 = n H 2 (n H 2 2) 2 >Lm2 3↑n(1+3↑n(1+2)) 2 > 3↑n+13.

Theorem: J J > Graham’s Number G(64), where G(n) = n (\n -> 3↑n3) 4

Proof:

J J =Lm1 2↑↑6 HH 2 >Lm3 (2↑↑6 / 2 - 1) (\n -> 3↑n3) 3↑23

(2↑↑6 / 2 - 1) (\n -> 3↑n3) 4 = G(2↑↑6 / 2 - 1) > G(64)

Leaving Melo’s Number in the dust

With 15 bits to spare, opportunities for vastly boosting Melo abound. Discord users 50_ft_lock and Sam found the following term that extends Melo’s H with an extra argument:

w218 = let { 2 = λf λx. f (f x); A = λa λb λc. c a b 2; T = λy. y (y A) } in T T T

which desugars to lambda term

(λT.T T T) (λy.y (y (λa λb λc. c a b (λf.λx.f (f x)))))

in conventional notation, or

(λ 1 1 1) (λ 1 (1 (λ λ λ 1 3 2 (λ λ 2 (2 1)))))

in de Bruijn notation, with 61-bit encoding

01 00 01 01 10 10 10 00 01 10 01 10 00 00 00 01 01 01 10 1110 110 00 00 01 110 01 110 10

Lemma 4. T T T = 2↑↑18 A 2 2 2 2 2 2 2 2 2 2

Proof: Let AA denote A A

  T T                                                            T
= T (T A)                                                        T
= T (A AA)                                                       T
= A AA (A AA A) T
= T AA                                                  (A AA A) 2
= AA (AA A) (A AA A)                                             2
= A AA A A                                              (AA A) 2 2
= A AA A 2                                              (AA A) 2 2
= 2 AA A                                              2 (AA A) 2 2
= AA (AA A) 2                                           (AA A) 2 2
= 2 A (AA A)                                          2 (AA A) 2 2
= A (A (AA A)) 2 (AA A)                                        2 2
= AA A (A (AA A))                                          2 2 2 2
= A (AA A) A A                                           2 2 2 2 2
= A (AA A) A 2                                           2 2 2 2 2
= 2 (AA A) A                                           2 2 2 2 2 2
= AA A (AA A A)                                        2 2 2 2 2 2
= AA A A                                         A A 2 2 2 2 2 2 2
= A A A 2                                        A A 2 2 2 2 2 2 2
= 2 A A                                        2 A A 2 2 2 2 2 2 2
= A AA 2 A                                         A 2 2 2 2 2 2 2
= A AA 2 2                                         A 2 2 2 2 2 2 2
= 2 AA 2                                         2 A 2 2 2 2 2 2 2
= AA (AA 2) 2                                      A 2 2 2 2 2 2 2
= 2 A (AA 2)                                     2 A 2 2 2 2 2 2 2
= A (A (AA 2)) 2 A                                   2 2 2 2 2 2 2
= A (A (AA 2)) 2 2                                   2 2 2 2 2 2 2
= 2 (A (AA 2)) 2                                   2 2 2 2 2 2 2 2
= A (AA 2) (A (AA 2) 2) 2                            2 2 2 2 2 2 2
= 2 (AA 2) (A (AA 2) 2) 2                            2 2 2 2 2 2 2
= AA 2 (AA 2 (A (AA 2) 2))                         2 2 2 2 2 2 2 2
= AA 2 (A (AA 2) 2)                          A 2 2 2 2 2 2 2 2 2 2
= A (AA 2) 2 A                           2 2 A 2 2 2 2 2 2 2 2 2 2
= A (AA 2) 2 2                           2 2 A 2 2 2 2 2 2 2 2 2 2
= 2 (AA 2) 2                           2 2 2 A 2 2 2 2 2 2 2 2 2 2
= AA 2 (AA 2 2)                        2 2 2 A 2 2 2 2 2 2 2 2 2 2
= AA 2 2                         A 2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= 2 A 2                        2 A 2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= A (A 2) 2 A                      2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= A (A 2) 2 2                      2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= 2 (A 2) 2 2                      2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= A 2 (A 2 2) 2                    2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= 2 2 (A 2 2) 2                    2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= 4 (A 2 2) 2                      2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= A 2 2 (A 2 2 (A 2 2 (A 2 2 2)))  2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= A 2 2 (A 2 2 (A 2 2 2))    2 2 2 2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= A 2 2 (A 2 2 2)      2 2 2 2 2 2 2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= A 2 2 2        2 2 2 2 2 2 2 2 2 2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= 2 2 2 2        2 2 2 2 2 2 2 2 2 2 2 2 2 2 A 2 2 2 2 2 2 2 2 2 2
= 2↑↑18 A 2 2 2 2 2 2 2 2 2 2

These 2↑↑18 iterations of A also let us relate its magnitude to the so-called Fast-growing hierarchy, a family that assigns, to each ordinal α, a function [α] (diverting from the usual fα notation for improved legibility) from natural numbers to natural numbers. We’ll treat all numbers as Church Numerals, so we can write n f instead of the usual fn and write f n instead of f(n) as normally done in λ-calculus.

Readers unfamiliar with ordinal arithmetic, may want to skip the next section.

The following FGH definition differs slightly from the standard one, which has the slightly slower growing [0] n = n+1 and [α+1] n = n [α] n. This allows Lemma 5 to be exact rather than a mere lower bound.

Definition of Fast Growing Hierarchy

  1. [0] n = 2 n = n2
  2. [α+1] n = n 2 [α] 2 = A 2 [α] n
  3. i+1(α+1)] n = [ωi+1α+ωi n] 2

Lemma 5. For k≥0, n>=2, : k+1 A 2 [ωk α] n = [ωk (α+1)] n

Proof:

Base k=0: 0+1 A 2 [ω0 α] n = A 2 [α] n = n 2 [α] 2 = [α+1] n

Step k>0: k+1 A 2 [ωk α] n = A (k A 2) [ωk α] n = n (k A 2) [ωk α] 2 = [ωk α + ωk-1 n] 2 = [ωk (α+1)] n

Lemma 5 gives w218 = (2↑↑18 A 2 [0] 2) 2 2 2 2 2 2 2 = 2^2^2^2^2^2^2^([ω2↑↑18-1] 2). In comparison, Graham’s and Melo’s Numbers are known to be much smaller at around [ω+1] 64 and [ω+1] (2↑↑6) respectively.

A Functional Busy Beaver

The λ-calculus analogue to BB is:

BBλ(n) = the maximum beta normal form size of any closed lambda term of size n

which appears in the Online Encyclopedia of Integer Sequences (OEIS) as functional Busy Beaver function Besides being simpler than BB, it has the advantage of using the standard unit of information theory, bits, rather than states.

The much more fine-grained use of bits allows the first 36 values of BBλ to be currently known, versus only 5 values of BB.

Since both are Church numerals, term Melo implies that BBλ(49) ≥ 5(Melo’s Number)+6, while w218 implies that BBλ(61) ≥ 5(2^2^2^2^2^2^2^([ω2↑↑18-1] 2))+6.

BB compared bit-by-bit to BBλ

The growth rates of the two BB functions may be compared by how quickly they are known to exceed certain large number milestones, that correspond to well known ordinals in the Fast Growing Hierarchy.

For Graham’s Number, at ordinal ω+1, we saw earlier that Melo’s 49 bits compares with 14 states, which take 14*2*(2+4) = 168 bits to encode. If I lose my bet, then the comparison becomes rather closer at 49 vs 70 bits.

For Goodstein’s function at ordinal ε0, 111 bits compares with 51 states taking 51*2*(2+6) = 816 bits.

For the limit of Bashicu Matrix System (BMS), at (presumed) ordinal PTO(Z2), 331 bits compares with 150 states taking 150*2*(2+8) = 3000 bits.

Finally, for Loader’s Number, at (presumed) ordinal PTO(Zω), 1850 bits compares with 1015 states taking 1015*2*(2+10) = 24360 bits.

One reason for TMs taking many more bits to achieve comparable growth, especially at the larger milestones, is the extremely poor programmability of TMs. The λ-calculus, despite its similar bare bones nature, doesn’t share this drawback. Modern high level pure functional languages like Haskell are essentially just syntactically sugared λ-calculus, with programmer friendly features like Algebraic Data Types translating directly through Scott encodings. The bruijn programming language is an even thinner layer of syntactic sugar for the pure untyped lambda calculus, whose extensive standard library contains many datatypes and functions. It is this excellent programmability of the λ-calculus that facilitated the construction of highly optimized programs for BMS and Loader’s.

Because programming a Turing machine is so impossibly tedious, that people have resorted to implementing higher level languages like Not-Quite-Laconic for writing nontrivial programs such as the TM that halts only upon finding an inconstency in ZFC. The above 1015 state TM for exceeding Loader’s Number even includes a λ-calculus interpreter written in NQL!

In his paper The Busy Beaver Frontier, Scott Aaronson tries to answer the question

But why Turing machines?

“For all their historic importance, haven’t Turing machines been completely superseded by better alternatives—whether stylized assembly languages or various codegolf languages or Lisp? As we’ll see, there is a reason why Turing machines were a slightly unfortunate choice for the Busy Beaver game: namely, the loss incurred when we encode a state transition table by a string of bits or vice versa. But Turing machines also turn out to have a massive advantage that compensates for this.”

Interesting behaviour at small sizes

“Namely, because Turing machines have no “syntax” to speak of, but only graph structure, we immediately start seeing interesting behavior even with machines of only 3, 4, or 5 states, which are feasible to enumerate.”

The number of uniquely behaving TMs with “only” 5 states is 4^10 * 632700 = 663434035200 , which is more than the number of closed lambda terms of size at most 52 bits (513217604750). The latter certainly exhibit no less interesting behaviour, so TMs hold no advantage here.

Ancient and fixed computational model

“And there’s a second advantage. Precisely because the Turing machine model is so ancient and fixed, whatever emergent behavior we find in the Busy Beaver game, there can be no suspicion that we “cheated” by changing the model until we got the results we wanted.”

The λ-calculus is just slightly more ancient and is arguably more fixed. There is no choice of tape alphabet size, no choice of whether the tape head needs to move in every transition, no choice of halting and output convention, and no choice in number of tapes or tape heads.

The λ-calculus can neither be suspected of being designed toward fast growth, so again TMs hold no advantage here.

The only remaining advantage of BB over BBλ is the many decades of research behind and publications about it.

A Universal Busy Beaver

Is BBλ then an ideal Busy Beaver function? Not quite. It’s still lacking one desirable property, namely universality.

This property mirrors a notion of optimality for shortest description lengths, where it’s known as the Invariance theorem:

Given any description language L, Lopt is at least as efficient as L, with at most constant additive overhead.

In the realm of beavers, this means we require of an optimal Busy Beaver BBopt that it surpass any Busy Beaver function bb (based on self-delimiting programs) with at most constant lag:

for some constant c depending on bb, and for all n: BBopt(n+c) ≥ bb(n)

While BBλ is not universal, the closely related

BBλ2(n) = the maximum output size of self-delimiting BLC programs of size n

achieves universality by giving λ-calculus terms access to pure binary data. BLC programs consist of an encoded lambda term, followed by arbitrary binary data, that the term is applied to.

Since (λ_. t) applied to any (standard lambda representation of) binary data equals t, BBλ champions provide lower bounds for BBλ2: for all n, BBλ2(2+n) ≥ BBλ(n).

In conclusion

The largest number (currently known to be) representable in 64 bits is w218, which lower bounds both BBλ(61) and BBλ2(63).