Vitalik Buterin

Cofounder, Ethereum

Vitalik Buterin - Exploring fully homomorphic encryption (FHE)

🎥 May 08, 2020 📺 Ethereum Foundation ⏱ 102m 👁 3955 views

Slides: https://drive.google.com/file/d/1XfvTReuBgQZE95_RSefO-YhhhGh5Emhj.

About Vitalik Buterin

Vitalik Buterin, co-founder of Ethereum, has been speaking at multiple events in 2026 about the future of the Ethereum protocol and its intersection with artificial intelligence. In a panel on the Ethereum Economic Zone (EEZ), Buterin described the concept as an effort to rethink layer-2 solutions in a way that integrates them more deeply with Ethereum, rather than treating them as separate chains. He argued that without strong pressures toward interoperability and censorship resistance at the user layer, the result can be "walled garden monopolies" that use the base layer primarily for virtue signaling. He also identified oracles as a "skeleton in the closet" of Ethereum, noting that their security has not received the same level of rigor as layer-2 scaling solutions. In separate talks, Buterin has emphasized Ethereum's role as a "public billboard" and a "shared computation layer" for high-value guaranteed execution, rather than a platform meant to compete with high-frequency trading or chase maximum speed. He stated that Ethereum needs to pass a "walk away test," meaning it should remain reliable even if no core developers remain. On AI, Buterin argued that while local and open-weight AI models have improved significantly, the mainstream open-source ecosystem does not by default prioritize privacy, security, or censorship resistance. He expressed hope that the Ethereum community can create tools that optimize for these properties, including ZK-based payment channels that make API requests private and unlinkable. Buterin also contrasted crypto's approach to safety with centralized visions that he described as "trust the uncle in the sky," saying crypto aims to create systems that preserve user agency and privacy.

Source: AI-verified profile updated from Vitalik Buterin's recent appearances. Browse all interviews →

Transcript (62 segments)

✨ AI-enhanced transcript with speaker attribution

Vitalik Buterin0:00

Today, yeah, what I wanted to do is basically go through the basics of how fully homomorphic encryption works. Fully homomorphic encryption, as I'll get into a bit later, is this kind of encryption that allows you to run computations on encrypted values. There are a lot of things that are really interesting about this. You can do privacy-preserving computation really easily. It's also an important building block in some multi-party computation protocols, and in a lot of the proposed obfuscation protocols that people are starting to come up with. The interesting thing with fully homomorphic encryption is that it sounds scary. There's a lot of impressions that it's this very complex form of cryptography, just figured out only a couple of years ago, so it must be really complicated to understand. It turns out it's completely not. It turns out that fully homomorphic encryption protocols are literally simpler to fully understand than even a lot of elliptic curve based constructions. The reason we're not using fully homomorphic encryption for everything is basically because it's not very efficient. It has high overhead, it has ciphertexts that are very large, and I'll get into why this is the case as well. So to start off, let's talk about what is fully homomorphic encryption. The goal of fully homomorphic encryption is basically to have an encryption scheme where you can compute on encrypted data. If you have an encryption of X, you want to be able to turn that into an encryption of f of X for some function f, without being able to decrypt, so without being able to determine X or determine f of X. You can think of partially homomorphic protocols that we know about already. If you think about elliptic curve points, this doesn't technically fully satisfy the definition of fully homomorphic encryption because it's not an encryption scheme, it's just a group that has a homomorphism. But if you look at elliptic curve points, if you take some number X and you multiply it by some elliptic curve point that's the generator, and then you add the generator multiplied by Y, you get the generator multiplied by X plus Y. This is a really nice property of elliptic curve points, that they follow this distributive property. There are a lot of protocols that make use of this. In the case of blockchains, for example, there are deterministic wallets that have master public keys that rely on these elliptic curve operations. Even zero-knowledge proof protocols based on elliptic curves, things like Bulletproofs, even just signatures, rely on this kind of additive homomorphism. So it already provides a lot of value. If instead of adding you want to be able to multiply, then you can just use RSA. With simple RSA encryption, X^e, then (X*Y)^e equals X^e times Y^e. So if you have the ciphertext for X and the ciphertext for Y, you can create the ciphertext for X*Y. So we have this family of schemes that gives us additive homomorphism, and we have this family of schemes that gives us multiplicative homomorphism. The question is, can we come up with a scheme that lets us do both? Can we come up with some way of making a ciphertext so that you can do both adding and multiplying on ciphertexts? If you can add and multiply, then you basically have general-purpose computation. If you restrict your ciphertexts to just being zeros and ones, then you can do pretty much every logic gate. If you wanted to AND, then AND of X and Y is just X times Y. If you want to do OR, then OR is basically you can invert X, invert Y, then do an AND, then invert again, or you have this simpler formula where you just say X plus Y minus X times Y. If both inputs are 0, then this is obviously 0. If X is 1, then you have a 1 here, you have a 0 here, you subtract a 0, so you get 1. If Y is 1 and X is 0, it's symmetric, so you also get 1. If X and Y are both 1, then you get 1 plus 1 minus 1, so we also get 1. For XOR, you can do just X plus Y, and that's if you don't care about the numbers staying within the range 0 and 1, if you just care about them being odd or even. If you care about the numbers being within the range 0 and 1, then you can just do X plus Y minus 2 times X times Y. If X and Y are both 1, then it's 1 plus 1 minus 2, and you get 0. So if you can add and multiply, you can do pretty much whatever logic gates you want, and you can do arbitrary computation. It's not going to be efficient arbitrary computation, because if you imagine wanting to do some math based on numbers, you'll have to decompose those numbers into bits, then you'll have to turn your addition and other operations into circuits, and you have to evaluate the circuits. You can see how the complexity multiplies, but you can do it. To give the intuition around how these FHE protocols work, we'll start with a kind of toy protocol. This is, I believe, technically secure under some very unfavorable key sizes and complexity, but this is the way that people started figuring out how to be able to do addition and multiplication. Imagine your private key is a big prime P, a really big prime number. To encrypt a message, where a message is restricted to either zero or one, what you do is you take a big random number, then you multiply it by P, then you add together a small random number multiplied by 2, and you add your message. So what we have here is a multiple of P, offset by some noise and a message that's hidden in the least significant bit of the noise. If you want to decrypt, you decrypt the ciphertext with the key by basically saying it's the ciphertext mod P, and that kicks out this term, and then you just do mod 2, and that kicks out this term. By the way, if people have questions at any point, please feel free to ask. I'm happy to answer. So why is this scheme additively homomorphic? Why is this an approach where you encrypt a message by taking a big multiple of your key, adding some noise, and then hiding your message in a bit of the noise? Why is this homomorphic? Basically, if you imagine you have two ciphertexts, each one of these ciphertexts is going to be a multiple of P plus some noise plus a message. These things just naturally distribute. CT1 plus CT2 is actually a ciphertext in the correct format, because you just have a multiple of P that is k1 plus k2, then you have your noise, and the even part of the noise also adds up, so you have 2r1 plus 2r2, and then your messages add up, m1 plus m2. Note that this is addition mod 2. If your message here is 1 and your message here is 1, then this is 2, but the 2 over here is indistinguishable from just being part of the noise, so it just becomes the same as 0. If we just want to do binary circuits, this is totally fine. You can add together two ciphertexts and you get a ciphertext that is of the XOR of the message bits. Now why is this multiplicatively homomorphic? Let's say you multiply together your two ciphertexts. Then you have an expression of the form: your first ciphertext, k1 P plus 2r1 plus m1, multiplied by your second, k2 P plus 2r2 plus m2. You basically just fully expand this. When we expand it, let's figure out which terms are multiples of P, because the decryption process starts off by just modding P, which kills all your multiples of P. So we don't have to care about them. You have k1 P multiplied by all three of these, then you have k2 P multiplied by all three of these, so these five terms go away. Now we have to figure out which of the remaining terms that are not multiples of P are even. It's basically 2r1 multiplied by this, and you have 2r2 multiplied by this, so we have a bunch of terms that are multiplied by 2. Then you have this thing over here which is not a multiple of P and not multiplied by 2, which is just the product of the messages. If you take this expression and then you do a mod P and then you do mod 2, the mod P kills these 5 terms, the mod 2 kills these 3 terms, and so you just have this one term left. So we see that this is additively homomorphic, and we see this is multiplicatively homomorphic. Now why does it have any level of security? It's this approximate GCD problem. If you did not have noise, if we did not have this 2 times small random, then you have just a bunch of multiples of P plus 0 or 1. A bunch of them are going to be just multiples of P plus 0, then you can just use any GCD algorithm, like the Euclidean algorithm, and just GCD all of these multiples together, and you get P, and then you can break the scheme. But as it turns out, if you just have approximate multiples, then figuring out what P is becomes much harder. The security of this cryptographic scheme is based on the hardness of computing approximate GCD, as opposed to computing exact GCD, which is easy. There's a more general principle here: solving systems of equations becomes much more difficult when these equations have errors or noise. If you add some noise into your equations, then suddenly figuring out a hundred parameters that are part of those equations, which you can normally figure out easily, just suddenly becomes much harder. This is pretty much what the security of all of these constructions ends up being based on. As a side note, public key encryption: I showed this scheme as being a secret key scheme, you need P to encrypt. Turning these schemes into public key schemes is really easy. You basically just provide a bunch of encryptions of 0. Then if you can publicly encrypt 0 by just computing a random linear combination of these, you can add some more error. If you want to encrypt 1, then you could just take these encryptions of 0 and add 1. If you want something more generic, then into your public key you can add some encryptions of 1. Then to encrypt 1, you add a bunch of encryptions of 0 and then an odd number of encryptions of 1, and you have a new encryption of 1. So certainly any fully homomorphic scheme that is private key can be turned into a public key one pretty easily. That's not a problem. Now why doesn't this work already? There are two problems. One of them is that multiplication just doubles the length of the ciphertext. Here you had k1 P and k2 P, and here you have k1 k2 P times P, so it's k1 k2 P squared. The bit length of this multiplication is just going to be the sum of the bit lengths of the ciphertexts that are coming in. If you try to do more than a couple of multiplications, it just blows up horribly and becomes impractical. You can't do more than a logarithmic number of multiplications. But there is another problem, which is overflow of the errors. If you remember, the sum of the ciphertexts turns into a new ciphertext of this form, and the product of the ciphertexts turns into a new ciphertext of this format. We can look at what happens to the even error terms here. When you add, the error roughly doubles, you take the sum of the errors. When you multiply, the error squares. When you had r1 and r2 here, you have 2 r1 r2, so the bit length of the error doubles. Once again, the maximum multiplicative depth is going to be the logarithm of the number of bits in P, or the log of log of P. You can think of the maximum multiplicative depth as being really low, like something like ten or even five. It's tiny. So how do we overcome this overflow problem? There are two general categories of solutions. The first category of solutions is that we do clever tricks to make multiplying only increase the error by a constant factor. Instead of having log log P multiplicative depth, you actually do have a multiplicative depth proportional to the number of bits in P. If we can make a protocol where every multiplication increases the error by some fixed factor, say a factor of a thousand, then that's only ten bits. If your modulus has a bit length of say 10,000, then your multiplicative depth goes up to 1,000, which is already huge. So this is one solution. The second kind of solution is bootstrapping. Bootstrapping is this kind of switching mechanism where we imagine that you have some ciphertext that has some noise, and you want to convert it into a fresh ciphertext that has less noise. What we'll do is we will take the circuit that represents decrypting X under some secret key, and we're going to evaluate it in a homomorphic way. We're going to take X as a ciphertext, then we're going to represent X as a bunch of bits. We're going to provide a bootstrapping key, think of this as being part of the public key of the system. First, you can encrypt the bits of X under some new key s2. You can encrypt the bits of X here under some new key s2, but also we provide the bits of s under s2. So what this basically means is that you have the regular public key for s2, and you also have an encryption of s under s2. Given the value of X, you convert that into its bits, then you convert that into encryptions of the bits of X. Then you have encryptions of the bits of X, you have encryptions of the bits of s, and you just compute the decryption process homomorphically using these inputs. What this gives you is that the decryption process returns one bit, so you get one bit which is the decryption of X. It is the same bit that X represents, except it gets encrypted under the key s2. Before we had X, which is an encryption of some bit under s, and now we have basically an encryption of that bit under the key s2. The important thing is that X might have had a lot of error, but this decryption is only going to have a fixed amount of error, regardless of how much error you have over here. The reason is that from the perspective of working on your s2, all of these bits start off as being ciphertexts of the minimum error level, they start being depth one. This decryption has some constant depth, so the amount of error that's going to be in the result is actually going to be constant, even if there is a lot of error in the ciphertext here. Before I continue, maybe I should offer an opportunity for some questions, because bootstrapping is a little bit complicated. Let's think for a bit and make sure you understand what's going on here.

Audience Member18:37

One question I have regarding bootstrapping: is there randomness involved, and where does it come from?

Vitalik Buterin18:45

Bootstrapping just basically means that you're evaluating the decryption circuit. The place where there is error involved is basically that the process of homomorphically encrypting these bits is going to involve taking values from the public key, and those values already have some noise inside of them. The process of executing the circuit after you have those values doesn't require you to add any extra randomness, but the bootstrapping key already has randomness that got used in the process of constructing it. Does that make sense?

Audience Member19:34

Could you describe what exactly the output of the decryption function looks like when it's evaluated in the circuit?

Vitalik Buterin19:47

The output of the decryption function is either the bit 0 or the bit 1, depending on what the ciphertext X represents.

Audience Member19:59

The thing I'm having trouble with is that I don't want the one evaluating the circuit to actually see those bits.

Vitalik Buterin20:10

Correct. The reason why the person evaluating the circuit cannot see those bits is because that person does not have X. Computing the decryption requires you to have X. In your bootstrapping key, you're not going to give them X; you're going to give them an encryption of the bits of s under s2. The only thing they can do is evaluate the decryption circuit in a way that gives them outputs that are encrypted under the key s2. They have no way of extracting the output in the clear.

Audience Member20:47

I see. So they don't actually get the decryption, they get the decryption circuit. They have some representation of it, they get the bits of s encrypted using s2, and they run this circuit.

Vitalik Buterin21:03

Exactly. They have s encrypted under s2, they have X in the clear, and so they just encrypt X under s2. Dec is just a circuit, it's part of the algorithm, so everyone has it. You just evaluate Dec homomorphically using bits encrypted under s2, and you get 0 or 1 encrypted under s2.

Audience Member21:28

So what do you write below this? Dec is not actually something I have to do, I only have to run the decryption circuit, and that automatically gives the encrypted output.

Vitalik Buterin21:38

Yes, that's correct. It's just whatever the value is that is encrypting. Thank you.

Audience Member21:58

Once you've re-encrypted this, why can't you back-propagate the new lower error format with the lower error and gain a log factor or constant of your multiplication trick through every previous multiplication again?

Vitalik Buterin22:17

How would you back-propagate the lower error? I haven't played with this yet, but there might be some method. I think the problem is that lower error plus higher error equals higher error. Generally, any function, if you throw in a low error thing and a higher error thing, you get a higher error thing out. Because the errors are just randomly generated, you're not going to be able to make them cancel in some way. Okay. One subtle technical point: this X is encrypted under s, and here we're saying we encrypt under s2. Technically, you could just set s equal to s2. You can technically just have the bits of s encrypted under s, and then the bootstrapping key be the bits of s encrypted under s. This makes things a lot easier because instead of having lots of keys or a chain of bootstrapping keys, you can just have one key. In standard descriptions of bootstrapping, people don't do that because for weird technical reasons, you can't prove a security reduction straight to the lattice assumptions. There's this special term called circular security. Depending on whether you're one of these cryptographers who is like 'no, you have to have proofs' versus one who is like 'yeah, whatever, it's fine', it's your choice whether or not to trust the security. A lot of people think it's fine.

Audience Member24:10

I have a question. This s2 is also a public key, so the private key don't know. Just have to use several public keys of the recipients, right? What is the circuit depth of the decryption?

Vitalik Buterin24:31

Good question, and I'm about to get to that in the next slide. The main problem is that the multiplicative circuit depth of this decryption involves a modular reduction, and X mod P has a circuit depth of log log X, which is bigger than log P, and this is higher than what FHE can support. One of the reasons why nobody cares about the scheme I just spent 10 minutes describing is because bootstrapping is not possible. In general, bootstrapping has a circuit depth which is too high. The next parts of all of this are about how to adjust the scheme so that you actually can bootstrap. We're going to skip over a few things. Craig Gentry came up with some really clever tricks in 2009 involving subset sums and other things I don't understand to try to get around this problem, to reduce the circuit depth of bootstrapping to the point where we can actually bootstrap in fairly simple FHE schemes. Craig Gentry is great, and we love Craig Gentry, and he's going to come back in the story. He is the one that created the matrix scheme protocol. But we're going to go straight to Brakerski and Vaikuntanathan's work in 2011, which was the first scheme that really managed to get around this problem. The schemes that I'm going to show from here on don't depend on approximate GCD; they depend on a slightly different but kind of similar assumption, which is the Learning With Errors assumption. The Learning With Errors assumption basically says that if you have a system of equations, you have a bunch of variables, you have a bunch of equations, normally you can solve them using Gaussian elimination and it's very easy. But if you just have approximate equations, so you have a whole bunch of equations and even if the system is really over-specified, they have a small additive error, then finding any of the variables becomes very hard. Solving systems of linear equations where the outputs have errors in them is something which is hard, and as far as we know, is quantum hard. These mechanisms also reduce to hardness of lattices and the shortest vector problem and all of these other problems that are fairly well studied in mathematics. The schemes that I'm going to get into from here on are not going to depend on approximate GCD; they are going to depend on hardness of solving approximate systems of linear equations. So let's get to BV 2011. The key is going to be a vector, a vector of numbers s1, s2, ..., sn. The ciphertext is going to be a vector a that satisfies the property that a dot product s is a message plus an even error. This is all going to be done modulo some Q, it doesn't have to be prime but it has to be odd. You generate an a such that a1 s1 plus a2 s2 plus ... plus an sn equals m plus 2 times your error. The connection between this and the Learning With Errors problem is that you can think of each of these ciphertexts as being an equation, and your secret key is the coefficients. The coefficients in the equation are going to be the a, and the output is going to be some small number, 0 or 1 depending on what your message is, plus some error. That error perturbs the values and makes it hard to extract s. Your ciphertext is just a. If you're comfortable thinking in terms of vectors and dot products, it's just a such that a dot product s equals some message plus 2 times an error. You can think of it as this linear equation. One optimization is that you can set the first number in your secret key to 1, and that just makes it very easy to construct these a's. You basically generate all your other a's randomly, and then you set a1 to compensate for whatever the rest of the dot product gives. You construct a1 to be the minus of that, and then it adds up together with this, and you get your message. If you set a1 to be kind of subtract another m plus 2e, then this dot product just becomes m plus 2e. Constructing vectors that satisfy this equation is very easy. As I mentioned, an alternative interpretation of this vector interpretation is that a ciphertext is a noisy equation where the variables are your secret key. Addition is easy. If you have aL such that aL dot s equals m1 plus 2e, and aR such that aR dot s equals another m2 plus 2e, then dot products are additive, so aL plus aR dot s is just going to be the sum of your messages plus the sum of your errors. The way you decrypt is you compute a dot s and you take the last bit. You can decrypt the addition, and if you decrypt the addition, you get twice the even error and then you have the XOR of your two messages again. Addition here is simple. Multiplication is harder. The problem here is that in the scheme I showed you before, you're just operating with single numbers, and single numbers can be added and multiplied. Here you're operating over vectors, and vectors can't just be multiplied with each other. Well, they kind of can. There is this notion of an exterior product. A vector multiplied by another vector is just a big vector that contains all of the products of the elements. If you imagine aL is a vector with 10 elements, aR is a vector of 10 elements, the exterior product is going to be a vector of 100 elements: the first times the first, plus the first times the second, all the way up to the first times ten, then the second times the first, etc. There is an algebraic identity that says aL exterior product aR dot product with s exterior product s equals this dot product multiplied by this dot product. Remember that aL was one ciphertext, aR is one ciphertext, and the product of the decryptions, m1 times m2, is just going to be this dot product times this dot product. Because of this algebraic identity, that equals the same thing as aL exterior product aR dot product with s exterior product s. So if you take two ciphertexts and you just exterior product them, you turn them into this big vector that contains all possible products of their elements, then that becomes a valid ciphertext of m1 times m2 under the key s times s. You can multiply, but the length of your secret key grows quadratically. The challenge is you need to have a way of going back to linear. What we're going to do is have a procedure that we call relinearization. The procedure for relinearization feels a bit like bootstrapping, because what you do is you have encryptions of s under your new key, and then you just evaluate things on your new key. It's kind of similar, but it's doing something slightly different. Relinearization does not solve the error problem; it just solves the problem of taking a ciphertext under the key s times s or s exterior product s and turning it into an encryption under s. The relinearization key is going to contain encryptions of s_i s_j times 2 to the d. It's going to contain s_i s_j for all i and j, all elements of s exterior product s, and also those elements times 2, times 4, times 8, and so on, up to every power of 2 that's smaller than your modulus. This lets us compute an encryption of the dot product of aL exterior product aR with s exterior product s as a linear combination of s_i s_j to the d. Just as we did in bootstrapping, we're not going to evaluate it in the clear because you don't want the other person executing the circuit to be able to figure out the output in the clear. Instead, we're going to evaluate it homomorphically under either some new key, or if you're willing to assume circular security, you can provide it under s. We're going to evaluate this whole thing homomorphically as a linear combination of all these powers. For example, if you have aL as [1, 2] and aR as [3, 4], then aL exterior aR is [3, 4, 6, 8]. To evaluate aL exterior aR dot product with s exterior s, you're going to have 3 multiplied by s1 s1, 4 multiplied by s1 s2, 6 multiplied by s2 s1, and 8 multiplied by s2 s2. To avoid having to multiply ciphertexts, which makes the error blow up, you have these powers of two, and the multiplication you can just express as a linear combination of these powers. To express 3 times s1 s1, you're going to take the encryption of s1 s1 times 2^0 plus the encryption of s1 s1 times 2^1, and this gives us s1 s1 times 3 encrypted under your new key. Then for 4, you have s1 s2, so you'll use the encryption of s1 s2 times 2^2. For 6, 6 is 2^1 plus 2^2, so you're going to add the encryption of s2 s1 times 2^1 and the encryption of s2 s1 times 2^2. For 8, you're going to add the encryption of s2 s2 times 2^3. If you add all these encryptions, you have a vector which, if you dot product it with your new key, would give you s1 s1 times 2^0 plus s1 s1 times 2^1 plus ... plus some error, because each of these terms have some error added to them. You add all of this error, and you have the set of terms. This gives you 3 times s1 s1, 4 times s1 s2, 6 times s2 s1, 8 times s2 s2, which is the same thing as this term here. Basically, we've created a sum of a bunch of ciphertexts such that if you were to decrypt it, if you were to dot product it with the new key, you would actually get a value which equals the evaluation of this plus some more error. If you were to evaluate it, then you would cancel the error out, because this gives you m1 m2 plus some error. So this preserves the message, and because it's an addition of a bunch of linear size ciphertexts, the ciphertext size and the key size also go back to linear. I'll stop again and wait for questions, because relinearization is also not very intuitive.

Audience Member38:41

This is super cool. One question I have is you mentioned that this helps reduce the size of the key but not helping with the error, correct?

Vitalik Buterin39:03

It does not help with the error. Basically, because this expression itself is going to evaluate to this multiplied by this. aL is equal to m1 plus 2e, aR is equal to m2 plus 2e, so the product of those things is going to contain a term that has 4 e squared. This thing evaluates to 4 e squared plus some more error that comes from a logarithmic number of these. The error still blows up from being a multiple of e to being a multiple of e squared. There is another magic trick to make the error blow up less quickly, and I will get to this after. But first, I'll wait for questions about this.

Audience Member40:01

Encrypt s_i s_j to the d? This is yes, very good question. I was about to answer this myself. In the encryption scheme as I provided, it can only encrypt 0 and 1 because it has some even error. You might ask, what does it mean to encrypt this thing which is obviously going to be much bigger than 0 and 1? The answer is that the encryption of s_i s_j to the d is going to be a vector a such that if you were to dot product it with the key, you would get s_i s_j times 2 to the d plus some even error. You cannot actually extract s_i s_j to the d from the ciphertext because there is the error, and the error is just going to wash away everything except for the really high bits. It's going to wash away some bits in the middle. It is basically a vector where if you dot product that vector with the key, you get s_i s_j to the d plus even error. The reason why this is still useful is because if you add up these ciphertexts, then you still get aL times aR or aL exterior aR dot product with s exterior s plus a whole bunch of even errors. These even errors just kind of add together with the even errors that are already inside of here. Whoever ends up ultimately decrypting this ciphertext would still have the errors all even, so they will be able to cancel them out.

Okay, I got this. It's just kind of nothing groups properly, like parcel the final ciphertext. You never let up, right?

Vitalik Buterin42:06

Okay, so here is the magic trick to make your error blow up less quickly. First, a fun fact: we can switch ciphertexts between what you can think of as being two perspectives. One perspective is your ciphertext is a, and a dot product s equals m plus 2e, where m is either 0 or 1. If you just mod or divide a by 2, this is all mod Q, then a divided by 2 dot product with s gives you m becomes m over 2, and 2e becomes e. Your small even error becomes a small error that doesn't have to be even, and m over 2: 0 divided by 2 is 0, and 1/2 is ceiling of 1/2 of your modulus. If your modulus is say 999, then 1/2 is going to be 500, because 500 times 2 is 1000, which mod 999 reduces to 1. You can basically take a ciphertext that is of this form, and then you divide your a by 2, and you convert it into a ciphertext that has this format where instead of your message being in the lowest bit, your message becomes in the high order bits. This perspective is better for multiplication because if your message is in low order bits, then when you multiply your two messages together, your message is still preserved in the field of two elements. But in this perspective, the message is at the top, and everything interferes. So in this perspective, you can't multiply. But this perspective has a key advantage: it makes the noise less structured. Here the noise has to be even, here the message just becomes small. By cleverly switching between these perspectives, you can, without having the secret key change, change the modulus of your ciphertext. Here's what we do. We start off with a such that a dot product s equals m, zero or one, plus 2 times the error. Then you do a perspective switch, and you have an e prime which is a modular divided by 2, which is equal to small error plus either zero or half your modulus. Now what we're going to do is switch it from mod Q to mod P. We're going to take this ciphertext, take the a prime, and we're going to multiply it by P and then integer divide by Q. If you do this, then what you're going to get is a value which, if you then multiply it by s, you do rescale a but you do not rescale s. This is a big importance. If you dot product the rescaled a prime with the same s, then the output is going to be rescaled. Zero becomes zero, ceiling of Q over 2 becomes roughly ceiling of P over 2, small error becomes small error except the error itself also gets downscaled. It gets downscaled from being something times Q to being the same small fraction times P. The actual magnitude of the error goes down. Then when you do a reverse perspective switch, you go from this perspective where it's either zero or ceiling of P over 2 to this perspective, you double this, and then it becomes either 0 or 1 plus 2 times the error.

Audience Member46:13

I didn't understand the previous step. Why is there this time? Why does this work? Why does it give you the one on the right? Why is the message preserved?

Vitalik Buterin46:34

If you just think of this as being just a multiplication, then if it was an encryption of zero, it's just going to give you a multiple of 2. If it was an encryption of one, then this gives you a multiple of Q plus half of Q. If you multiply this by P over Q, then multiples of Q would turn into multiples of P, and multiples of half of Q turn into half of P. One example: if Q is 10,000 and P is 1,000, then if a prime dot s is say 50,000 plus a bit of error, then if you go down from Q to P, multiplied by P over Q, it's basically P is 1,000, Q is 10,000, so divided by 10. Your 50,000 becomes 5,000, and 5,000 mod 1,000 is still 0. But if you instead had say 55,000, then 55,000 times P over Q is 5,500, and mod 1,000 is 500. The error introduced by the division not being perfect is still a small error, so it just mixes together with the error that already exists. This should really say e prime is supposed to be e, it's a good correction. Yes, sorry, that's correct, this should be prime. Even the contribution from e also changes. Going back to this example, if a prime dot s was 55,032, so your error is 32, then if you multiply by P over Q, 55,032 becomes 5,503.2, so mod 1,000 it's 503. 500 is P over 2, so your new error is 3. The ratio between the error and the modulus size remains the same. It seems like you're not accomplishing anything, because the error relative to the modulus is the same. But the key is that you can do this repeatedly. You can keep doing this rescaling operation, and each time you reduce the absolute size of the error. If you do it enough times, you can bring the error down to a small constant. This is the key insight that allows you to do many multiplications without the error blowing up. So that's the magic trick.

Ratio is the same but you're actually accomplishing something really important, which I'll get into in the next slide. So I guess, so right now, right? Yes. Are people satisfied that if you have a ciphertext whose error gets bigger, you can turn it into a ciphertext whose error as a number is bounded by some constant, but it's just a modulus as to keep, like, is being reduced, right? So this is kind of the lesson here.

Yeah, okay. So here is why we switch, right? So there's two reasons to switch, actually. So the first reason to switch is this makes bootstrapping actually practical, right? So bootstrapping involves basically computing this dot product mod Q, and computing the star product mod Q has a depth circuit depth of log Q. And by reducing the modulus, what we can do is we can basically turn something which to its decryption requires a circuit depth of log Q to something that has a circuit depth of log log P, and P here, like I said, can be very small, right? Once you do the modulus switch, you don't have to actually do any more computation, so P can be a pretty small constant. And so this basically means that the circuit depth of bootstrapping becomes a constant, and so you can just increase Q to as much as you need in order to make the bootstrapping procedure actually possible.

Yeah, so in the bootstrapping key, basically the way that we do it, bootstrapping is just kind of computing this dot product mod P, or kind of as a circuit. And so to make this sum computable, you basically just provide the s_i x 2^d values in binary representation, right? So for every s_i times 2^d, you get, well, you basically, and this is modulo P, then you provide a binary representation of the s's, so we have n log n log P numbers in binary representation. So for every one of these terms, you have your s_n, and basically to compute an s_n, you take, for every one bit that's in a_n, because you have a_n in the clear, you multiply that by the corresponding bit representation of the power of s_n. And so you just have a bunch of these numbers, and these are numbers in binary representation, right? So they're not single ciphertexts, they're log P ciphertexts. And then you just add a whole bunch of these numbers together, and then you take modulo. If you like, the best choice for P is going to be 2^n - 1, and that makes modulo really easy because you just make your addition circuits wraparound.

So how do we implement these addition circuits, right? So basically, the addition circuit is just going to be, you have these n log P numbers, and you just have to add together these numbers, right? And then the modulo, basically your addition circuits, instead of carrying out onto even higher bits, every time you carry past the size of P, you just kind of wrap around, right? And then that just gives you modulo 2^n - 1 for free, right? So if your modulus was 1023, then your bits are in 128, 256, 512, 1024 is equal to 1, so you just kind of wrap that around to 1, so modulo-ing is really easy. So your setting is you just have to add together a whole bunch of numbers, right? And the first step is that you can reduce three numbers to two numbers, so you can have a kind of three-to-two adder that has multiplicative depth one, right? And the way that this works is basically that every bit of P is going to be just the XOR of the corresponding bits of A, B, and C, and every bit of Q is going to be the two-of-three function of A, B, and C, right? So this is 0 if A+B+C is 0 or 1, and 1 if A+B+C is 2 or 3. And you can think of the S as being the carry digit, right? And the Q you also kind of multiply by 2, which basically means you just kind of left shift Q by 1, right? So the reason why this works is basically because every bit in A, B, and C gets represented inside of P, but then in the specific case where you have kind of two bits that turned onto one, then inside of here they get turned into 0, but you get, you flip the corresponding bit from a 0 to a 1, and then the Q gets kind of moved to the left by 1, and so that multiplies it by 2. And so basically, every individual bit in A, B, or C gets some kind of correctly represented in P plus Q, right? So this is a kind of three-to-two adder, and this has multiplicative depth of one. Adding two numbers X plus Y does require logarithmic multiplicative depth, right? So when you go from N numbers to two numbers, that requires a multiplicative depth which is basically logarithmic in N, so order of making the number of numbers, but then adding the last two does require multiplicative depth log log (X+Y), and here X and Y are mod P, so basically saying log log P, or log of the number of digits of P, right? So if P is 1023, then P has 10 digits, the log of 10 is going to be about 4, and so your multiplicative depth is just going to be 4. Now the reason why adding circuits have this non-trivial logarithmic multiplicative depth is basically because you have to worry about threading the carry bits through the number. And in the code that I'll link to later, I have algorithms for doing this. So in general, computing a linear combination of size n log P is going to take size, so first of all you have these N numbers, and second, you have this kind of, the sum is going to be a size n log P, and so it's kind of log of n log P, and so this becomes log n plus log log P, and so the total O is going to be log n plus log log P. And you can pretty easily just set Q to be high enough that this is less than log log Q, and so you can basically implement the decryption circuit, and so we can bootstrap.

So here's optimization number two, right? So I mentioned back when we were over here, right, that there are two tricks to overcome overflow: one is bootstrapping, and the other is clever tricks to make your multiplication only increase your error by a constant factor. So this is the clever tricks that I'm going to talk about, right? So basically, imagine if instead of just doing a modulus switch before bootstrapping, we do a modulus switch after every multiplication, right? So if you don't do this, right, so if you imagine a circuit that has some multiplicative depth, then your error is going to square every time you do a multiplication, right? So if you imagine your first ciphertext, let's say the modulus is 10^16 - 1 and your ciphertext has error 10^2, if you square it, the error is going to be 10^4, actually it'll be 10^4 times some constant, but let's say it's 10^4. Then the third time you multiply, the error is 10^8, and the fourth time you multiply, the error is 10^16, and then the error kind of overwhelms the ciphertext, and then decryption starts failing. So if you just do this kind of fully homomorphic encryption naively, then after a logarithmic number of multiplications, you're screwed. Now what happens if you do modulus reduction, right? So you have X which has an error of 10^2 modulo 10^16 - 1, then you multiply it, and so you have X^2 with an error of 10^4. Now if you perform a reduction, and so here you have X with error 10^4 modulo 10^16 - 1, we're going to divide the modulus by a factor of 100, and so your error gets cut from 10^4 to 10^2, and your modulus gets cut from 10^16 to 10^14. And so now you have this term over here, and now if you square this again, right, then your X^2 goes up to X^4, your error of 10^2 goes back up to being an error of 10^4, your modulus is the same, then you reduce it again, and now your error goes down to 10^2, your modulus kind of hops down again, then you square it again, the error goes up from 10^2 to 10^4, and then you just reduce the modulus again. And so notice that instead of the error squaring every time, basically we have the modulus just kind of very nicely and orderly stepping down by a constant factor every time, right? So the clever trick here is basically that when you multiply the ciphertexts, the errors get multiplied. And so if you just make sure that the errors as numbers stay small, then squaring the error is always just going to be a constant factor multiplication. And so instead of having a logarithmic number of steps, you can have a linear number of steps. And so your multiplicative depth goes up from log log Q to log Q.

Pause for questions again.

Audience Member1:00:03

I didn't completely follow everything, so it's okay. The key step of what we want to decrease the error by multiple, if the upside means the maximum multiplicative depth that you can support, right? So what if you notice, like the example at the top and the example at the bottom here, your error blows up to the point where it overwhelms the modulus after three multiplications, right? Because each time you're squaring. Here we're keeping the error under control, right? The error is just always 10^4 down to 10^2, up to 10^4 down to 10^2, up to 10^4 down to 10^2, except every time the modulus just steps down by a constant factor of 10^2. And so here, instead of doing three steps, you can do eight steps, right? So here you can do more than two steps here because you watch these steps. And both optimizations are kind of unlocked by this context switch, right? Right, right. So both the bootstrapping and this technique that lets you basically not need bootstrapping half the time are unlocked by the modulus switch.

Vitalik Buterin1:01:14

Okay, so now we move on to what this presentation was supposed to be about, which is matrix FHE. So the problem that matrix FHE solves is basically, can we try to find something that's kind of more natural and more efficient than relinearization, right? So relinearization is expensive basically because relinearization requires you to just add this really huge number of ciphertexts. And if you imagine a large modulus, you're talking about hundreds of ciphertexts for each element. So can we try to make something that's kind of somewhat nicer, and can we also try to create a protocol that's somewhat simpler, where adding and multiplying ciphertexts actually is done by adding and multiplying elements in some kind of naturally reasonable array? So matrices, yay. Um, so here is kind of the context, right? So to encrypt 0 with our tool, we use LWE with a secret key. What we're going to do is we're going to come up with a matrix A such that the secret key multiplied by the matrix is an error. And to encrypt 1, we're going to have a matrix A such that the secret key multiplied by A gave us the secret key plus an error, right? So basically what we're doing is we're kind of hiding an approximate eigenvector, where the eigenvalue, if it's 0, the eigenvalue is 0, and if it's 1, the eigenvalue is 1. And if you really want, you can potentially encrypt other values as well, I can see that, and the eigenvalue kind of becomes the place where the plaintext gets hidden. And it's only an approximate eigenvalue, and so you can't extract it using one of the standard techniques because learning with errors makes it hard. So addition is easy, right? Basically, addition, if you just add together two matrices, then if you add together a matrix such that sA equals something with error, and another such that sA equals something else, then s(A_L + A_R) is just going to be basically whatever the values were for the first one plus whatever the values are for the second, right? So s times the decrypt of A_L plus A_R becomes decrypt of A_L plus decrypt of A_R, which is going to be your left message times s plus your left error plus your right message times s plus your right error, and so you have the sum of your messages times s plus the sum of the errors, right? This is basically the same thing that we had before, except here, because instead of dealing with vectors, we're dealing with matrices. And matrices can actually be multiplied. Then the decryption of the multiplication of the matrices is going to be s multiplied by A_L times A_R. And remember, you can think of matrices as being linear maps, and so you're going to kind of pass s through A_L, then you're going to pass s through A_R, and so the eigenvalues get kind of producted as well. And so you can kind of show that this expands out: so you have first s passing through A_L, then you multiply that by A_R. And s times A_L is going to be a kind of m_L s + e_L as before, right? So here s times A is going to be whatever your message is times s, either 1 or 0, plus your error. Except here we still have times A_R, so we have m_L s times A_R, then you have e_L times A_R. And so here, multiplying by A_R basically means your m_L s is going to turn into m_L m_R s, because A_R times s by itself is going to give you kind of m_R s, right? And so if s times A_R gives you m_R s, then m_L times s times A_R is going to give you m_L m_R s. So here, the decryption of the product is going to give you the product of the messages. And then over here, you have the two error terms, right? So one of the error terms is going to be the left error times A_R, and then here you have the right error times the left message. So you can add, you can multiply, but there is a problem, right? And the problem is that the error is not just being multiplied by the other error; instead, your error gets multiplied by the magnitude of the matrix elements. So if you start with matrices that have a fairly small magnitude, then every time you do a multiplication, the magnitude of your matrix elements just keeps on doubling. And so once again, you're limited to your depth being kind of log Q. By the way, just kind of one other reason why this matrix approach is nice: basically, from here on, we don't actually care about the modulus being odd, we don't care about them being prime, we don't care about modulus having any kind of special properties. And so to make the math easier, we're just going to set the modulus as equal to some power of 2, and that makes the math really easier, and it makes bootstrapping really easy as well because you can just kind of forget the carries.

So the problem is, right, that you have this kind of e_L times A_R, and these A_R values themselves are kind of getting bigger every time you multiply. And so how do you stop the error from just blowing up? So the fix to this problem is you have this kind of really strange and clever bit-splitting trick. And the bit-splitting trick basically says that if we treat X as being a vector, then we're going to define these two functions: one is called powers of two, the other is called bitify. Where powers of two basically says, for every element in X, you just kind of concatenate the powers of two of X. And then bitify of X just turns it into a bit representation. Then you have this identity that the dot product of powers of two of X together with the bitify of Y equals the dot product of X and Y. And so kind of visually, I have an example, right? So if you have X as 1, 2, 3 and then Y is 6, 5, 4, then powers of two of X: 1 turns into 1, 2, 4; 2 turns into 2, 4, 8; and 3 turns into 3, 6, 12. And then Y turns into its bit representation: so 6 is 1, 1, 0; 5 is 1, 0, 1; 4 is 1, 0, 0. And the dot product here, well here it's 1 times 6, and then here we're basically doing powers times bit representations, and so you have 2 times 1 and then 4 times 1, so this adds up to being 6. Then over here, 2 times 5, and basically the 5 turns into 1, 0, 1, and so you add 2 times 1, and then here you add 2 times 4 times 1 to kind of compensate for this having a higher place value. And so 2 times 1 plus 8 times 1 is 10, which is 2 times 5. And then 3 times 4 is 12, and so 3 times, well here for 4 you just have like a 1 here, and then the 4 gets multiplied by 3, so you have 12, right? So I guess also pausing and just kind of stare at this and verify for yourselves that this works. Right, so basically, powers of two converting X into powers of two and converting Y into a bit representation is a dot product preserving operation.

Audience Member1:09:16

It kind of feels a little bit like the relinearization trick, exactly.

Vitalik Buterin1:09:21

Yes, it's very similar, except this is kind of just explicitly representing linear relinearization using vector methods. It's exactly the same sort of stuff. So the bit-splitting technique also applies to matrices because matrix math is basically just batched vector math, right? And so powers of two of s times bitify of A equals s times A. Actually, this should be just actual matrix multiplication. And so what we're going to do is we're going to basically do, instead of doing A_L times A_R, we're going to do A_L times bitify of A_R, right? And we're going to set our ciphertext, instead of being n x n, we're going to set them as being kind of n times n log P. Basically, the idea is that whatever is on the left side, we're going to kind of set it as being permanently in this powers of two representation. And so bitify is going to give us an n log P times n log P matrix, and so the dimensions match up. But instead of multiplying A_L times A_R, we're going to multiply A_L times bitify of A_R. And the reason why we do this is basically because if you notice here, you have these two error terms: one has e_R times A_R, and one has e_L times A_R. And so if A_R elements are always going to be 0 and 1, then here you just have e_L, and then here you just have A_R multiplied by the message, which is just going to be 0 or 1. And so the error blow-up is small, right? The error blow-up is just going to be adding a whole bunch of these terms that were proportional to the original error.

Audience Member1:11:20

I'll admit it's the thing that's kind of difficult to see: is A_L times bitify of A_R still eigenvector preserving the same way that A_L times A_R was eigenvector preserving?

Vitalik Buterin1:11:42

How do I describe this? The kind of intuitive idea, right, is that the A_L's are kind of permanently in this powers of two form, and then these A_R's are permanently going to be bitified. And so their dot product is going to be the same as it was if you just had a kind of regular A_L together with just a regular A_R. And so the eigenvector here, instead of being s, instead of turning s into s, it's actually going to turn s into powers of two of s. But if this works, you can also just experimentally verify that it works, right? And the benefit is that, as I mentioned, instead of the error blow-up being large values, you just guarantee that these A_R values are always 0 and 1, and so the error blow-up is smaller.

Right. By the way, if anyone wants, I have code here. The code is actually not long. The homomorphic encryption here is about 300 lines of Python; this is under 200 lines of Python. So if you want to kind of see for yourself, if you want to kind of play around with matrices for yourself and verify things for yourself, I highly recommend looking through the code as well.

Optimizations, right? So there's a bunch of ways to optimize this. First of all, you might notice that one of the really interesting properties of matrix FHE is that the error growth is asymmetric. So here you're multiplying e_L by A_R, and here you have e_R being multiplied by a message. And you have this interesting property that if you have a ciphertext on the right that has a higher error, then the error actually barely increases at all, right? And so what this basically means is that if you're using matrix FHE, you might look like you don't really have this property; the error is kind of just multiplied. Here, if you want to perform some kind of operation, it actually makes sense to perform that operation kind of asymmetrically. So say if you want to do a bunch of additions, then you just kind of fold them all into the same sum one after the other. The error balance criterion becomes kind of more complicated; it's not just max multiplicative depth, it's not just max polynomial degree, it becomes a complicated thing. Other optimizations: you can decompose so you don't need to do things in base 2; you can also do this in base 2^B or base 4, base 16, and this potentially lets the error kind of increase less slowly. You can pack multiple bits into a ciphertext, and you can do kind of addition and multiplication on a lot of bits at the same time. There's also this thing called ring LWE, where basically instead of just thinking about a whole bunch of independent equations, you can represent those equations as being an equation in a polynomial ring. And I don't really have time to get into this, but this also lets you kind of decrease key sizes a lot. So this is still kind of basically how modern fully homomorphic encryption schemes work, except there's this whole suite of optimizations that people have been slowly coming up with. So the main reason why there is a big overhead is basically because the ciphertexts are matrices, right? And in order to satisfy these LWE assumptions, the ciphertexts have to have a pretty substantial length. And so we can think of these matrices as being something like a hundred matrices of size a hundred by a hundred, or whatever. And then here we're going to store them in powers of two and we're going to bitify them, and so the 100 potentially blows up into being 10,000 at least temporarily. And so matrix multiplication becomes a really big kind of multiplication procedure. And if you want to be able to process circuits of substantial size, then these numbers have to have a fairly substantial bit length, so potentially we're talking about over a hundred. So there's a bunch of factors that do end up conspiring to make it fairly inherent, this large blow-up in circuit sizes. And this is the reason why this is all kind of less elegant and less clean than things like elliptic curve cryptography, for example. But it is increasingly getting to the point where for small computations, it is completely viable, and you can do individual computational steps on the order of milliseconds. So there you go.

Audience Member1:17:00

Thank you, it was really cool. And thanks for the presentation. Hey, I have a question: what is considered the state of the art today? Like, there's all these different libraries out there, like FEG and all these things. So what is the best method right now?

Vitalik Buterin1:17:24

There is definitely a bunch of libraries. I get HElib as one of them, and there's some others with different names. So there is a survey or a book by Zvika Brakerski, the inventor of the 2011 protocol. So you could see, if I can work this up right now, here you have "Fundamentals of Fully Homomorphic Encryption." And so it's basically just search "Fundamentals of Fully Homomorphic Encryption" with Zvika Brakerski, and then what you get is from 2019. So the state of the art is definitely the matrix approach; they have different pros and cons, and then you just apply a whole bunch of optimizations. So LWE is one of them. Another one is various schemes for being able to kind of stuff a whole bunch of ciphertexts into a list, or a whole bunch of messages into the same ciphertext. So like, one of the ways that you can do this is you can imagine a ciphertext where A just satisfies a dot product with one message, you might say A dot product with message one, and A dot product with another message, and then you can do kind of SIMD operations. So you can, if you add and multiply, that kind of adds and multiplies all the plaintexts simultaneously. And you can do kind of rotations, you can do permutations. And so there's this kind of growing body of clever tricks, the same way that the zero-knowledge proof space has a growing body of clever tricks to try to compensate for the inherent inefficiency. So that's where we are right now: we're at kind of just taking this base and incrementally building upon it.

Audience Member1:19:26

Awesome. I sent a link to the report that you refer to in the Telegram channel. I just have a quick question.

I was just going to mention that you can also look up BGV 2011. There's also another protocol, BGV 2012, which avoids the need to do modulus switching entirely, basically by kind of pretending that the ciphertexts are fractional but still representing them as integers. So you're just going to feel free to look up the papers for all of this as well, and hopefully they'll be more understandable after this.

Yeah, this is like just coming back to your last thing about the switching into A_L and A_R matrices. So we can do this multiplication. How in the next step, right, you do it using the bitify. Isn't that actually exactly the same operation, just computing it in a different way? Right, so basically, the idea here is that bitify is guaranteed to always contain zeros and ones, right? I think I might have a good intuitive answer. So the intuitive answer is, first of all, like powers of two and bitify are kind of opposites in some sense, right? So if you bitify something, then you can multiply by a matrix which kind of converts it back into the pre-bitified form. But if you take a matrix that is not bitified, then the multiplication itself, when you do kind of modulo, and then you kind of un-bitify, you're basically kind of what this is, you can think of it as being kind of an un-bitified version of a bitified thing. And what happens is that the inverse of bitifying is kind of inherently baked into here. It has this property that if you apply it to things that are not bits, right, so a bitified matrix multiplied by another bitified matrix is not necessarily going to stay in bits because matrix multiplication adds a whole bunch of bits together. But you're still going to get a matrix, and because you have modular reduction, it's still going to reduce to modulo the modulus. And so basically, the modular reduction just kind of magically makes the higher-order terms go away. And because you're only doing the matrix multiplication while the matrices are bitified, the process of matrix multiplication never multiplies any particular value by more than the width of the matrix, and so the error only gets multiplied by a small amount, right? So it's actually not the same operation; it's a different operation, but which is still guaranteed to have the same consequence with respect to multiplying by things of the format. In fact, actually, just notice that I don't even know what this multiplication here means because okay, so we have bitify has blown up into n times log P, right? So what are these A_R? Are they tensors? Are they still two-dimensional matrices? How many dimensions?

Vitalik Buterin1:23:53

They are all matrices, right? So it's a way to intuitively think of this. Okay, so here's one way to intuitively think of this, right? So think of A_L as being a bitified matrix multiplied by something. So A_R has some size n log P times n log P, right? Bitify of A_R is a big square. A_L has side n times big, it's n times log P. And think of A_L as being bitify of A_L multiplied by the matrix that you're thinking of as a bitification matrix. So bitify of A_L times bitify of A_R, and then you're going to take bitify of A_L times bitify of A_R, then you multiply them together. And then to actually see what this means, you have your kind of universal bitification matrix multiplied by the product of the bitified matrices. So right, so basically you have these kind of matrices that have the property that if you multiply them by powers of two of s, then you get powers of two of s. And so you have these matrices, and so if you multiply by s, you get powers of two of s. And then if you take A_L and kind of expand it out and use it as a bitified thing, then the bitified things are kind of both, well, powers of two of s is an eigenvector basically. And so if you product together bitify of A_L with bitify of A_R, you get something where powers of two of s is also an eigenvector. And then if you kind of squash it back, then you get a ciphertext of the same form.

Audience Member1:25:46

I think yeah, I realized the parts that I don't understand, right? Yeah, I recommend not even just looking at the source, but even just kind of opening up a Python console and playing around with bitify and multiplying by that, and just kind of see what properties the ciphertext and the bitified ciphertexts have with respect to multiplying by powers of two of s.

Okay, all right. So now, here's the part I didn't understand: so there's no extra operation each time? The whole time we are computing with n times n log P matrices? Yes, aha, I see. Okay, so we changed our protocol; this is not just like a modification of the multiplication itself, right?

Vitalik Buterin1:26:44

Right, so another way to think about this, I think, is that this approach is just an optimization. Another way of thinking about this is you can make a less efficient but easier to understand protocol if your ciphertexts are n times n log P matrices. So imagine your ciphertexts are kind of bitified; they are big matrices that contain only zeros and ones, and they have the property that powers of two of s is an eigenvector, right? So we agree that if you just multiply A_L times bitify of A_R, then powers of two of s is going to be an eigenvector of the product, right? But then what this operation does is basically we want to preserve the invariant that your ciphertexts are just zeros and ones. Right? After you multiply two matrices together, they're not necessarily zeros and ones. And so what we're going to do is we're going to basically kind of do an inverse bitify operation to squash it. And because this squashing preserves the eigenvector, basically before it turns powers of two of s into powers of two of s, after it just turns powers of two of s into powers of two of s. So you just squash it and then you bitify it again. And the squashing and the bitifying basically preserves the eigenvector, but it kind of forces the values of the ciphertext to just continue being zeros and ones.

Audience Member1:28:30

And you mentioned like you can do operations in order of milliseconds. That's like sort of one bit essentially, like one bit operations, something like that?

I have a question. I'm not sure. Oh yeah, go ahead.

Yeah, of course. Thanks for a fantastic presentation. And the question is: you have talked so far only about multiplications of plaintext and ciphertext, but do you think that any of this technique translates to some more complex functions like polynomials of ciphertext, something more natural than just computing one operation?

Vitalik Buterin1:29:25

Yes, that's a good question. I think so. One of the challenges is, right, first of all, this technique is theoretically compatible with ciphertexts being not just zeros and ones but being arbitrary field elements. The problem is that because you have this m_L e_R term, basically if your messages are not zeros and ones, the error blow-up is large, right? So you could potentially, instead of working in the ring modulo 2, you can work in the ring modulo some small number, like maybe up to 100 or something. But working over things that are not bits just kind of seems intrinsically hard because multiplying by things that are not small numbers causes error to blow up very quickly.

Audience Member1:30:22

I'm sorry, I meant differently. No, not field elements, but instead of just multiplication, like multiplication of three different elements, or I don't know, so yes, polynomial of elements of small degree or something, right?

Vitalik Buterin1:30:41

So I mean, the trivial way to do any of this, right, is that if you want to multiply three things, there are mechanisms for that. I think multiplying a... So in FHE, if the individual m_L are bits, right, it's just 0 or 1, then multiplying by 3 doesn't have any effect, right? So if your messages are 0 or 1, you pretty much have to do everything using binary circuits. And the nice thing with binary circuits is that multiplication is just an addition of a kind of left-shifted values. And so you can do it. And I mentioned before, in the optimization session, right, that you can potentially have a ciphertext which contains multiple plaintexts, and you can do shifts, and so you could do a kind of large parts of things like addition and multiplication circuits that operate over many bits simultaneously. But I guess in general, operating over binary circuits is the kind of most fundamental thing. This ends up blowing up, but there are definitely optimizations. Still, the main challenge here is that the error messages always have to be 0 and 1. But there are optimizations that have to do with kind of computing more complicated gates in a way that ensures that the output is still 0 and 1 without having to do as much work as if you just did it naively with simple gates. So there's nothing super mathematically elegant, but there are these bags of tricks that are the way you optimize a lot sometimes.

Audience Member1:32:39

Okay, thank you. One question I have is that people often say that functional encryption is related to FHE. Do you know how to go from FHE to functional encryption, or how they are related?

Vitalik Buterin1:32:59

Right, so functional encryption, in terms of the definition, basically it says instead of being able to go from Enc(x) to Enc(f(x)) for arbitrary f, you can go from Enc(x) to just having f(x), but only for one specific f. So it's a kind of difference, and potentially more powerful primitive. In terms of how these functional encryption protocols work, I'll admit I haven't fully figured this out yet. They're definitely considerably more complicated than FHE.

Audience Member1:33:37

Okay, I mean, I guess one easy way, quite easy conceptually, is to go through obfuscation: bootstrap with a cheat and then go back down to functional encryption, right?

Vitalik Buterin1:33:51

Exactly. But even well, obfuscation protocols often in practice end up being built on top of functional encryption. There's like a lot of different paths to try to get to obfuscation, and none of them fully satisfy people yet, and they do end up having overhead. One really nice thing is that if you have an obfuscator that works for circuits of constant size, then you can turn that into an obfuscator that works for arbitrary size. And the way that you do that is to obfuscate a big program: first, you homomorphically encrypt the input, then you have the circuit itself provided in a homomorphically encrypted form. And so basically your function is going to be circuit evaluation, and so you're going to evaluate the encryption of X with the encryption of the circuit, and then you get an encrypted form of the output. And you generate a proof, so think of it as being a SNARK or something, but you can make proofs that kind of more nicely work together with the homomorphic encryption scheme so you have less blow-up. So basically, then what you do is you take your encryption of f(X) and you take your proof that you actually computed f and not some different function, and then you have a fixed-size program that checks the proof and verifies its correctness. And you have your f code, right? So that way you can kind of obfuscate programs of arbitrary size using a fixed-size obfuscation. So I think pretty much the exact same technique works for something called correlation intractability hashing. And I think the idea here is that you can have a provably secure Fiat-Shamir, and then if you have this hash function which has this very special property, you can boost it to arbitrary circuits. I think some of the lattice people managed to prove...

Audience Member1:36:36

I was just going to say that sounds right and interesting. And the other question I have is wonderful. So like, this one guy said a decade ago, I think FHE was optimized by an order of magnitude. I was wondering, like, is this going to continue, right?

Vitalik Buterin1:37:14

Yeah, so I've kind of taken you through the journey of the years of where homomorphic encryption was just getting massively better with new discoveries every two years, right? So like, over here you have only partial bounded degree, then you have this really complicated and now the complexity of decryption. So here we're seeing speed-ups. I think after these protocols, the speed-ups have definitely started slowing down, and you know, you have to kind of continue slowing down over time. I don't know enough to be able to say what a reasonable lower bound looks like. I think at this point, my instinct would say that we'll probably expect to see more speed-ups from optimizations than we will from kind of very fundamental changes to how ciphertexts work. So we're kind of coming up with gadgets that let us operate on many bits at the same time more effectively. I don't know.

Audience Member1:39:06

Okay, do we have more questions? One thing I'm kind of curious about is kind of the ultimate cryptographic primitive for the long-term future, let's say once we have quantum computers. And it's kind of unfortunate that from what I understand, right now we can't really do SNARKs with lattices. And I was wondering if you think this is like a fundamental thing, or like we'll have both FHE and lattices working in parallel?

Vitalik Buterin1:39:44

So you can do bulletproofs with some lattices, right? Because you can just use SIS commitments the same way you use Pedersen commitments. So it's very possible that someone will come up with some lattice-based protocol for kind of universal proofs. I don't know. I think the reason why it's hard, it seems fundamentally harder in the lattice world, is because you have these errors, and so you can't really do things like equality tests as easily, right? So for example, even one of the big challenges in a lattice-style cryptography is: can you even come up with a zero-testing key? And because of linearity, if you have a zero-testing key, you can turn that into an equality testing key, and that would just immediately give you a multilinear map. And it turns out that all of the approaches that people have come up with for trying to test if something equals zero using some key that allows you to do that without allowing you to decrypt everything else, it's inevitably weak in a way that breaks the security proof that shows that you can't extract everything else about the message. So, okay, yeah. I don't know. It definitely is possible that we'll end up needing both FHE and lattices in the long term, and that the two things just naturally specialize in different areas.

Audience Member1:41:30

The zero testing is very interesting. Yeah, and it's like one of those wonderful things where if we had it, then we would just solve everything. So sad.

Thank you so much, this was really interesting.