## One Interesting Thing: Homomorphic Encryption

I plan to start writing about one interesting thing I learn each week. This will serve two purposes: forcing me to think about interesting things, and also forcing me to write. Here is the first installment in the series.

Suppose I have a lot of computation I need to do, based on some initial data. (I sometimes find myself in this situation, having to calculate tables of number fields or something else of this flavor.) I could buy a supercomputer to do it for me, but then maybe I don’t need to these big computations very frequently, so it would be an expensive investment to buy a supercomputer for only occasional use. So instead, I decide to rent computing power from someone else’s supercomputer, so that I can upload my program and data to that other supercomputer.

Oh, but there’s a problem: my data are rather sensitive (so I’m probably not just computing tables of number fields), and I don’t know if the owner of the supercomputer is trustworthy or not. So I have to encrypt my data.

But now I have another problem: I have to do computations with my data, but they are encrypted. So, how do I do computations on encrypted data?

Let’s illustrate the problem with a simple example. Suppose my data consist of a string of letters. I encrypt them with a substitution cipher, replacing each letter with some other letter. So maybe I replace all instance of the letter “A” with “Q,” all instances of “B” with “D,” and so forth, making sure not to have any duplicates. Suppose I wish to increment the third letter of my string by 2, so if it was originally an “A,” then it should become a “C,” and if it was originally an “H,” then it should become a “J,” and so forth. How do I do it?

Well, if I have access to the original string, then it’s easy: just change that character. But what if I only have the encrypted version? I look at the string, and I see that the third letter is a “U.” But it wasn’t originally a “U”; that’s just the encrypted version of some letter. So, I don’t know how to increment it.

So, in order to do this increment procedure, I first have to decrypt the message. If I want this to be done on the supercomputer, then I need to provide it with the substitution key. But if I’m going to do that, then why did I bother encrypting it in the first place? My data is no longer securely encrypted. Alternatively, I could send the string back to my own computer, where I could do the process safely, and then encrypt it again and send it back to the supercomputer. But then what is the point of renting supercomputer time if I’m going to have to send the data back to my own computer whenever I want to do a computation?

In order to solve these problems, I need to encrypt my data in such a way that it is still possible to perform computations on it even when it is encrypted. To see how I might do this, let us return to the letter substitution example. If instead of having some weird rule about how to substitute letters I use a Caesar cipher, where my encryption is simply to push every letter forward by five positions (so that “A” becomes “F,” “B” becomes “G,” and so forth), then I can ask the supercomputer to increment the third letter by two positions even in its encrypted form, since incrementing the encrypted version by two also increments the unencrypted version by two.

This is a simple example of homomorphic encryption. Of course, the Caesar cipher is a terrible way of encrypting data, as it is trivial to decrypt. But some trickier encryption schemes like the Vigenère cipher also have the same property. They still aren’t secure, but we’re making small amounts of progress.

Unfortunately, my scope is still very limited here: I can increment some characters by fixed amounts, but I can’t perform other string operations. For example, I can’t replace the third character with a “T.” And if I’m using a Vigenère cipher, then I can’t reverse the string either. I want to be able to do more interesting operations with the ciphertext. Can I find an encryption method that will let me do everything I want just by acting on the ciphertext?

This is the question that homomorphic encryption answers. I won’t explain how to do the full version of it, but here’s a limited version, that can only deal with plaintext strings that are either 0 or 1 (so, two possible choices). Clearly, that isn’t sufficient for all our cryptographic needs, but it will serve to illustrate how some of this stuff works anyway.

Let $m$ be a binary digit, so either 0 or 1. To encrypt it, we choose integers $N,P,Q$. I’ll explain how they are to be related (i.e. how big they are) later. To encrypt $m$, I first pick a key $p$, which will be an odd, $P$-digit binary number (so the last digit is a 1). Now, let $m'$ be a random $N$-digit number whose last digit is $m$ (i.e. $m\equiv m'\pmod{2}$). Also, pick a random $Q$-digit binary number $q$. Encrypt $m$ as $c=m'+pq$.

Decrypting $c$ is easy if we know $p$: $m=(c\pmod{p})\pmod{2}$, where $c\pmod{p}$ means the unique integer from 0 to $p-1$ congruent to $c$ modulo $p$.

Craig Gentry, in his paper “Computing Arbitrary Functions of Encrypted Data” from which I learned most of this material, suggests choosing a parameter $\lambda$ and then setting $N=\lambda$, $P=\lambda^2$, and $Q=\lambda^5$. These choices don’t matter too much, and indeed I might suggest making $P$ a little larger relative to $N$, but they are suggestive of how large they need to be relative to each other.

Now, why is this encryption scheme homomorphic? If I have two plaintext messages $m_1$ and $m_2$, with ciphertexts $c_1$ and $c_2$, respectively, then $c_1+c_2$ and $c_1c_2$ are encrypted versions of $m_1+m_2$ and $m_1m_2$, respectively. (Note that I say “encrypted versions” rather than “the encrypted versions,” because there are many ways to encrypt the same message due to the random choices of $m'$ and $q$.)

However, we have to be a bit careful here, because the underlying point in this encryption scheme is that $c$ is close to being a multiple of $p$ and the relevant part of it (namely $m'$) is small relative to $p$. But if we do lots of operations, then the $m'$ bit can get ever larger, so that it is no longer small relative to $p$.

The conclusion is that this setup allows us to perform operations on a ciphertext a few times, but after that, we cannot reliably decrypt it into what we were supposed to get.

The Gentry paper explains how we can fix this problem into a fully homomorphic encyrption scheme, so that we can perform as many operations as we want. His paper is refreshingly readable, and he prefaces each of his technical remarks by an allegory of a jewelry store in which the owner (named Alice, appropriately enough for a paper on cryptography) does not trust her workers not to steal expensive jewels if they can get their hands on them.

Also helpful is this blog post by Craig Stuntz, called “What is Homomorphic Encryption, and Why Should I Care?” I recommend reading that before the Gentry paper for an explanation of the background of homomorphic encryption.