# The Future Gadget Lab vs. Kolmogorov complexity

「Steins;Gate」/「いさりび」

In Steins;Gate 12, the Future Gadget Lab is faced with a problem. They need to send a copy of a person’s memory, or about 3.24 TB, to the past, but they can only send 36 bytes. What is a MADDO SCIENTISTO to do? What they end up doing is sending that data to the Large Hadron Collider to compress with a black hole. Problem solved! Or is it?

Obviously, using a black hole to compress data is silly, since a black hole would probably just destroy the data. In fact, it’s probably unnecessary, since if it were physically possible to achieve that kind of compression, you’d be able to do it on any computer because of the Church-Turing thesis. The only thing that would change is how quickly the algorithm would run. Luckily, in theoretical computer science, when dealing with the question of whether something is possible or not, we don’t care how long it’ll take. So, is there a compression algorithm that can give us the desired result?

Basically what we’re doing when were compressing stuff is we’re trying to rewrite a bunch if zeroes and ones, or strings, into a smaller bunch of zeroes and ones that mean the same thing. In general, this is impossible to do, since there are less strings of zeroes and ones when you have less digits, so you can’t arbitrarily stuff, say eight bits of information into seven bits. What you can do is create a way to describe the zeroes and ones in a way that is smaller than what you start out with.

For instance, we can describe strings of zeroes and ones in plain English. It’s shorter to write “one hundred zeroes followed by one hundred ones” than writing

00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111

The obvious problem with our compression scheme so far is that there are some strings that are not very easy to describe and it might be longer to describe in English than it would be to just write the bits themselves.

Again, it appears to be difficult to create a compression scheme that works for any old string. It’s much easier to create such an algorithm when you know something about the kind of string you want to compress, like if it’s a certain size or if it has any patterns or something. In the case of the FG lab, we know they won’t need to worry about compressing strings of length 288 or less. Unfortunately, there isn’t much more we know about the strings they want to compress other than the fact that they’re huge.

Actually, there’s another problem, which is that they need a way to decompress the string they’ve sent. If I give you a description of the string in English, it’s not going to do you any good unless you already know English. So in addition to sending a compressed string, they need to send information about the compression scheme they used, all in 36 bytes.

In information theory, the length of the shortest description of a string $x$ is its Kolmogorov complexity, which we’ll denote $C(x)$. Formally, $C(x)$ is defined in terms of encodings of Turing machines that generate $x$, but it’s pretty easy to argue that the choice of how we describe $x$ doesn’t matter up to a constant term.

We know that $C(x) \leq |x| + O(1)$ (where $|x|$ is the length of $x$), since we can always just write out $x$. We also know that for every length $n$, we can always find a string $x$ such that $C(x) \geq |x|$ since there are $2^n$ strings of length $n$ but only $2^{n-1}$ shorter descriptions of those strings. These strings that can’t be compressed are defined to be random.

We can quantify this idea of compressibility by saying that a string $x$ is $c$-compressible if $C(x) \leq |x| – c$. And so, we can argue about how likely it is that we have a string that can be compressed to the degree that we want. For strings of length $n$, we have that at least $2^n – 2^{n-c+1} + 1$ of them are incompressible by $c$.

We have some numbers, so let’s play around with them. Remember from earlier that the approximate size of Okarin’s memories is 3.24 TB, or $3.24 \times 2^{40} \times 8$ bits, while the limit we can send is 36 $\times$ 8 = 288 bits. If we want to squeeze Okarin’s memories into a text message, we’ll have
$$c = 3.24 \times 2^{40} \times 8 – 288 = 2.849934139166592 \times 10^{13}.$$
WolframAlpha helpfully tells me that this is slightly more than 1.4 times as many red blood cells in the human body.

We can calculate the probability that we have a compressible string by figuring out the probability that incompressible, which we can do with our lower bound on the number of incompressible strings we have. Taking $n = 3.24 \times 2^{40} \times 8$, the probability is given by
$$1 – \frac{2^n – 2^{n-c+1} + 1}{2^n} = \frac{1}{2^{c-1}} – \frac{1}{2^n}.$$
Plugging our numbers in, we get
$$\frac{1}{2^{3.24 \times 2^{40} \times 8 – 288 – 1}} – \frac{1}{2^{3.24 \times 8 \times 2^{40}}}.$$

And unfortunately, after this step, WolframAlpha kind of refused to try to do the computation. I gave it a shot on my laptop in Python but all that happened was CPU usage went up to 100% for a few minutes and it started eating all of my memory. I kind of wanted to keep using my computer instead of waiting it out for a MemoryError exception, so I interrupted it. Doing some further estimation, I threw $2^{2^{40}}$ at WolframAlpha, since it’s kind of about the same neighbourhood in magnitude as the number we’re trying to get. Apparently, it’s a number that’s about 300 billion digits long, so one in that are the kind of odds you’re looking at.

I guess this is another way of saying there’s no chance in hell that what the FG Lab tried to do would ever work. And remember that we’re not talking about any particular compression algorithms or technologies here. We can’t come up with a better algorithm or scheme. We can’t use faster computers. We can’t use quantum computers (Holevo’s theorem says that we can’t, in general, use less qubits than classical bits for the same string). From an information theoretic standpoint, it is impossible.

But I’m pretty sure everyone already could’ve guessed that anyway.