Buterin: Insights into cross-L2 reads for wallets and other use cases

2023-06-23 01:52:27

Author: Vitalik Buterin Compilation: Vernacular Blockchain

In my previous article on the three shifts, I outlined some of the key reasons why it is valuable to start explicitly thinking about L1 + cross-L2 support, wallet security, and privacy as necessary fundamental features of the ecosystem stack Things are built as plugins that can be individually designed by a single wallet.

This post will focus more directly on the technical aspects of a particular subproblem, such as: how to more easily read L1 from L2, L2 from L1, or L2 from another L2. Addressing this issue is critical to enabling asset/keystore separation architectures, but it also has valuable use cases in other areas, most notably optimizing reliable cross-L2 call chains, including use cases such as moving assets between L1 and L2 . This article catalog What is the goal? What does cross-chain proof look like? What kind of proof scheme can we use? Merkle proof ZK SNARKs Dedicated KZG certificate Verkle tree proof polymerization direct status read How does L2 learn the most recent Ethereum state root? Wallets on non-L2s chains privacy protection Summarize

1. What is the goal?

Once L2 becomes mainstream, users will own assets across multiple L2s, and possibly L1s as well. Once smart contract wallets (multisig, social recovery, or otherwise) become mainstream, the keys needed to access certain accounts will change over time, and older keys will no longer need to be valid. Once those two things happen, users will need a way to change the keys that have access to many accounts located in many different places without making an extremely large number of transactions. In particular, we need a way to deal with counterfactual addresses: addresses that have not been “registered” on-chain in any way, but still need to receive and hold funds securely. We all rely on counterfactual addresses: when you first use Ethereum, you can generate an ETH address that someone can use to pay you without "registering" the address on-chain (this requires paying a txfee, So already hold some ETH). For EOA, all addresses start with a counterfactual address. With smart contract wallets, counterfactual addresses are still possible thanks in large part to CREATE2, which allows you to have an ETH address that can only be filled by a smart contract with a code that matches a specific hash.

EIP-1014 (CREATE2) Address Calculation Algorithm

However, smart contract wallets introduce a new challenge: the possibility of access key changes. This address is a hash of the initcode and can only contain the wallet's initial verification key. The current verification key will be stored in the wallet's storage, but that storage record will not magically propagate to other L2s. If a user has many addresses on many L2s, including addresses not known to the L2 they are on (because they are counterfactual), then there seems to be only one way to allow users to change their keys: an asset/keystore separation architecture. Each user has (i) a "keystore contract" (either on L1 or one specific L2) that stores verification keys for all wallets and rules for changing keys, and (ii) "wallet contracts" that read across chains for verification keys.

There are two ways to achieve this:

Lightweight version (only checks for updated keys): Each wallet stores the verification key locally and contains a function that can be called to check the cross-chain proof of the current state of the keystore and update its locally stored verification key key to match. When the wallet is first used on a particular L2, this function must be called to get the current authentication key from the keystore. -Pros: Cross-chain proofs are used sparingly, so it doesn't matter if cross-chain proofs are expensive. All funds can only be used with the current key, so it remains safe.

Disadvantage: To change the verification key, you have to do an on-chain key change in the keystore and in every initialized wallet (though not a counterfactual wallet). This can cost a lot of gas. Heavy version (check every tx): Every transaction requires a cross-chain proof showing the current key in the keystore.
Pros: less system complexity, cheap keystore updates.
Cons: Single tx is expensive, so more engineering is required to make cross-chain proofs much cheaper. Also not easily compatible with ERC-4337, which currently does not support reading mutable objects across contracts during verification.

2. What does cross-chain proof look like?

To demonstrate the full complexity, we will explore the most difficult case: keystore on one L2, wallet on a different L2. Only half of this design is needed if the keystore on the wallet is on L1.

Assume the keystore is on Linea and the wallet is on Kakarot. A full proof of the wallet key includes:

Proof of the current Linea state root, given the current Ethereum state root known to Kakarot
Proof of the current key in the keystore, given the current Linea state root There are two main tricky implementation issues here:
What kind of evidence do we use? (Is it a Merkel proof? Something else?
How does L2 first learn the nearest L1 (Ethereum) state root (or, as we shall see, possibly the full L1 state)? Or, how does L1 learn L2 state roots? In both cases, how long is the delay between what happens on one side and what is demonstrable on the other side?

3. What types of proof schemes can we use?

There are five main options: -Merkle proof -General ZK-SNARKs

Dedicated proof (e.g. using KZG) -Verkle demonstrates that they are between KZG and ZK-SNARKs in terms of infrastructure workload and cost.
No proof required, relies on direct state reading In terms of required infrastructure work and user cost, I'd roughly rank them as follows:

"Aggregation" refers to the aggregation of all proofs provided by users within each block into one large meta-proof that combines all proofs. This is possible with SNARK and KZG, but not with Merkle forks (you can combine Merkle forks a bit, but it only saves you log(txs per block)/log(total number of keystores), in practice It's probably 15-30% in the middle, so it's probably not worth the price). Aggregation only becomes worthwhile when the scenario has a large number of users, so in practice a version 1 implementation could omit aggregation and implement it in version 2.

4. How will Merkle proofs work?

This one is easy: just follow the diagram in the previous section. More precisely, each "proof" (assuming the case of proving the maximum difficulty of one L2 to another L2) will contain: A Merkle branch attesting to the state root of L2 holding the keystore, given the latest state root of Ethereum known to L2. The keystore holds L2's state root stored in a known storage slot at a known address (the contract on L1 represents L2), so the path through the tree can be hardcoded. A Merkle branch attesting to the current verification key, given the state root of L2 holding the keystore. Here again, the authentication key is stored in a known storage slot at a known address, so the path can be hardcoded. Unfortunately, Ethereum Proofs of State are complex, but libraries exist for validating them, and if you use those libraries, this mechanism is not too complicated to implement. **The bigger issue is cost. **Merkle proofs are very long, unfortunately the Patricia tree is ~3.9 times longer than necessary (to be exact: an ideal Merkle proof for a tree holding N objects is 32 * log2(N) bytes long, and because Ethereum's Patricia trees have 16 leaves per child, and the proof of these trees is 32 * 15 * log16(N) ~= 125 * log2(N) bytes long). In a state with about 250 million (~2²⁸) accounts, this makes each proof 125 * 28 = 3500 bytes, or about 56,000 gas, plus the additional cost of decoding and verifying the hash. Together, the two proofs would end up costing around 100,000 to 150,000 gas (if used per transaction, excluding signature verification) - much higher than the current base price of 21,000 gas per transaction. However, if the proof is verified on L2, the discrepancy gets worse. Computation inside L2 is cheap because the computation is done off-chain and in an ecosystem with far fewer nodes than L1. On the other hand, data must be published to L1. So the comparison isn't 21k gas vs 15k gas; it's 21k L2 gas vs 100k L1 gas. We can calculate what this means by looking at the comparison between the L1 gas cost and the L2 gas cost:

Currently, L1 is 15-25 times more expensive than L2 for simple sending and 20-50 times more expensive for token swaps. Simple sending is relatively large amount of data, but the calculation amount of exchange is much larger. Therefore, the swap is a better benchmark for the approximate cost of L1 computation versus L2 computation. Taking all of this into account, if we assume a 30x cost ratio between L1 computational cost and L2 computational cost, this seems to imply that the cost of putting a Merkle proof on L2 could be equivalent to 50 regular transactions. Of course, using binary Merkle trees reduces the cost by a factor of ~4, but even then, in most cases, the cost is too high - if we are willing to sacrifice no longer compatibility with Ethereum's current hexagonal state tree, we Look for better options.

5. How will the ZK-SNARK proof work?

The use of ZK-SNARKs is also conceptually easy to understand: you just replace the Merkle proofs in the diagram above with ZK-SNARKs proving the existence of those Merkle proofs. A ZK-SNARK requires ~400,000 GAS for computation, which takes about 400 bytes (compared to: a basic transaction requires 21,000 gas and 100 bytes, which can be reduced to ~25 words after compression in the future Festival). Therefore, from a computational point of view, the cost of ZK-SNARKs is 19 times the cost of today's basic transactions, and from a data point of view, the cost of ZK-SNARKs is 4 times that of today's basic transactions and 16 times the cost of future basic transactions times. Those numbers are a big improvement over the Merkel proof, but they're still pretty expensive. There are two ways to improve on this: (i) special purpose KZG proofs, or (ii) aggregation, similar to ERC-4337 aggregation but using fancier math. We can study both at the same time.

6. How will special-purpose KZG proofs work?

Warning, this section is more mathematical than the others. This is because we're moving beyond general purpose tools, and building some cheaper special purpose ones, so we have to go "under the hood" more. If deep math isn't your thing, skip straight to the next section. First, a recap of how KZG commitments work: We can represent a set of data [D_1...D_n] using the KZG proof of a polynomial derived from the data: specifically, the polynomial P, where P(w) = D_1, P(w²) = D _2 ... P(wⁿ) = D_n. w here is the "uniform root", the value of wN = 1 for some evaluation domain size N (this is all done in finite fields). To "commit" to P, we create an elliptic curve point com(P) = P₀ * G + P₁ * S₁ + ... + Pk * Sk. here: -G is the generator point for the curve -Pi is the ith coefficient of the polynomial P -Si is the ith point in the trusted setup To prove that P(z) = a, we create a quotient polynomial Q = (P - a) / (X - z), and create a commitment com(Q). It is only possible to create such a polynomial if P(z) is actually equal to a. To verify the proof, we check the equation Q * (X - z) = P - a by performing an elliptic curve check on the proof com(Q) and the polynomial commitment com(P): we check e(com(Q), com( X - z)) ? = e(com(P) - com(a), com(1)) Some key properties to know include:

the proof is just the com(Q) value, which is 48 bytes -com（P₁） + com（P₂） = com（P₁ + P₂） This also means you can "edit" the value into an existing contract. Suppose we know that D_i is currently a, we want to set it to b, and the existing commitment to D is com(P). promise "P, but P(wⁱ) = b, and no other evaluation changes", then we set com(new_P) = com(P) + (ba)*com(Li), where Li is "lag Langerian polynomial", equal to 1 at wⁱ and 0 at other wj points. To perform these updates efficiently, each client can precompute and store all N commitments to the Lagrange polynomial (com(Li)). In an on-chain contract, it might be too much to store all N commitments, so you could make a KZG commitment to the com(L_i) (or hash(com(L_i)) set of values, so whenever someone needs When updating the on-chain tree, they can simply provide proof of their correctness to the appropriate com(L_i). So we have a structure where we can keep adding values to the end of the growing list, albeit with some size limit (actually, hundreds of millions might be doable). We then use this as our data structure to manage (i) commitments to the list of keys on each L2, stored on that L2 and mirrored to L1, and (ii) commitments to the L2 key-commitment list, Stored on Ethereum L1 and mirrored to every L2. Keeping commitments updated can be part of the core L2 logic, or implemented via a deposit and withdrawal bridge without changing the L2 core protocol.

Therefore, a full proof requires:

The keystore holds the latest com (key list) on L2 (48 bytes) -KZG proves that com(key list) is com(value in mirror_list), commitment to all key list submission lists (48 bytes) -KZG proves that your key is in com (key list) (48 bytes, plus 4 bytes for the index) It is actually possible to combine two KZG proofs into one, so we get a total size of only 100 bytes. Note a subtlety: because a keylist is a list, not a key/value map like state is, the keylist must be assigned positions in order. The key commitment contract will contain its own internal registry, mapping each keystore to an ID, and for each key, it will store the hash (address of the keystore) instead of just the key, In order to explicitly communicate with other L2s which keystore a particular entry is talking about. The advantage of this technique is that it performs very well on L2. The data is 100 bytes, which is ~4 times shorter than ZK-SNARKs and shorter than Merkle proofs. The calculation cost is mainly one pair check, or about 119,000 gas. On L1, data is less important than computation, so unfortunately KZGs are a bit more expensive than Merkle proofs.

7. How will the Verkle tree work?

Verkle trees essentially involve stacking together KZG commitments (or IPA commitments, which can be more efficient and use simpler encryption): to store 2⁴⁸ values, you make a KZG commitment to a list of 2²⁴ values, each of which is itself a KZG Commitment to 2²⁴ Value. Verkle trees are strongly considered for Ethereum state trees, because Verkle trees can be used to hold key-value mappings, not just lists (basically, you can create a tree of size 2²⁵⁶ but start empty, only if you Only fill specific parts of the tree when you actually need to fill them).

What a Vicker tree looks like. In fact, you can give each node a width of 256 == 2⁸ for IPA-based trees, and 2²⁴ for KZG-based trees. Proofs in Verkle trees are somewhat longer than in KZG; they can be hundreds of bytes long. They are also difficult to verify, especially if you try to aggregate many proofs into one. Actually, Verkle trees should be considered like Merkle trees, but more feasible without SNARKing (because of lower data cost), and SNARKing is cheaper (because of lower proof cost). The biggest advantage of Verkle trees is that they can coordinate data structures: Verkle proofs can be used directly for L1 or L2 states, have no superposition structure, and use exactly the same mechanism for L1 and L2. Once quantum computers become a problem, or once Merkle branching proves to be sufficiently efficient, Verkle trees can be replaced in-place by binary hash trees with suitable SNARK-friendly hash functions.

8. Aggregates

If N users making N transactions (or more realistically, N ERC-4337 UserOperations) need to prove N cross-chain claims, we can save a lot of money by aggregating these proofs: combining these transactions into a block or The builder of the bundle that goes into the block can create a proof that proves all of these claims at the same time. This could mean:

ZK-SNARK proofs for N Merkle branches
A KZG multi-proof
A Verkle multi-proof (or multi-proof ZK-SNARK) In all three cases, each proof requires only a few hundred thousand gas. The builder needs to make one of these on each L2 for users in that L2; thus, for ease of construction, the whole scheme needs to be sufficiently used that there are usually at least a few transactions in the same block on multiple major L2s trade. If ZK-SNARKs are used, the main marginal cost is just the "business logic" of passing numbers between contracts, so each user may need several thousand L2 gas. If KZG multi-proof is used, the prover needs to add 48 gas for each keystore holding L2 used within the block, so the marginal cost of the scheme per user will be per L2 (not per user) and then Added ~800 L1 gas. But these costs are much lower than without aggregation, which inevitably involves more than 10,000 L1 gas and hundreds of thousands of L2 gas per user. For Verkle trees, you can use Verkle multi-proofs directly, adding about 100-200 bytes per user, or you can make ZK-SNARKs of Verkle multi-proofs, which cost similarly to Merkle-forked ZK-SNARKs, but the proof costs much lower. From an implementation point of view, it is better for bundlers to aggregate cross-chain proofs through the ERC-4337 account abstraction standard. ERC-4337 already has a mechanism for builders to aggregate parts of user actions in a custom way. There's even an implementation for BLS signature aggregation, which can reduce gas costs on L2 by a factor of 1.5 to 3, depending on the other forms of compression included.

Diagram from the BLS wallet implementation post showing the workflow for BLS aggregate signatures in an early version of ERC-4337. The workflow for aggregating cross-chain proofs may look very similar.

9. Direct status reading

A final possibility, also only available for L2 reading L1 (and not L1 reading L2), is to modify L2 so that they directly make static calls to contracts on L1. This can be done with opcodes or precompilation, which allows calls to the L1, where you provide the target address, gas and calldata, and it returns the output, but since these calls are static, they can't actually change any L1 state. L2 has to know that L1 has already processed the deposit, so there's nothing at all preventing such a thing from being implemented; this is mostly a technical implementation challenge (see: this from an optimistic RFP to support static calls to L1). Note that if the keystore is on L1, and L2 integrates L1 static calls, no attestation is required at all! However, if L2 does not integrate L1 static calls, or if the keystore is on L2 (which may eventually have to be done once L1 becomes too expensive for users to use even a little bit), then proof will be required.

10. How does L2 learn the most recent Ethereum state root?

All of the above schemes require L2 to access the nearest L1 state root or the entire nearest L1 state. Fortunately, all L2s already have some functionality to access the latest L1 state. This is because they need such functionality to handle messages from L1 to L2, especially deposits. In fact, if L2 has a deposit function, then you can use that L2 as is to move the L1 state root into a contract on L2: just have the contract on L1 call the BLOCKHASH opcode, and pass it to L2 as a deposit message . The full block header can be received on the L2 side and its state root extracted. However, each L2 preferably has an explicit way to directly access the full latest L1 state or the nearest L1 state root. The main challenge in optimizing the way L2 receives the latest L1 state root is to achieve security and low latency at the same time:

If L2 implements "directly read L1" functionality in a lazy fashion, only reading the final L1 state root, then the delay is typically 15 minutes, but in extreme cases of inactivity leaks (which you have to tolerate), the delay can be weeks.
L2 can definitely be designed to read updated L1 state roots, but since L1 can recover (which happens during inactive leaks even with single-socket finality), L2 needs to be able to recover as well. From a software engineering point of view, this is technically challenging, but at least Optimistic has the capability.
If you use a deposit bridge to bring L1 state roots into L2, then simple economics may require a long time between deposit updates: if the full cost of a deposit is 100,000 gas, let's assume $1800 in ETH and a fee of 200 gwei, and L1 roots enter L2 once a day, this would cost $236 per L per day, or $13,148 per L2 per year to maintain the system. An hour of delay costs a whopping $315,569 per L2 per year. At best, there's a constant stream of impatient wealthy users paying to update and keep the system up to date for everyone else. At worst, some altruistic actors will have to pay for it themselves.
"Oracles" (at least the technology some DeFi people call "oracles") are not an acceptable solution here: wallet key management is a very security-critical low-level function, so it should depend at most on several A very simple, low-level infrastructure that requires no cryptographic trust. Also, in the opposite direction (L1s reads as L2):
In Optimistic Aggregation, state roots take a week to reach L1 due to fraud proof delays. In ZK rollups, it now takes hours due to a combination of verification time and economic constraints, although future technology will reduce this.
Pre-confirmation (from sequencer, certifier, etc.) is not an acceptable solution for L1 reads L2. Wallet management is a very security-critical low-level function, so the security level of L2 - L1 communication must be absolute: it is not even possible to push a wrong L1 state root by taking over the L2 validator set. The only state root that L1 should trust is one that has been accepted as the final state root by L2's state root holding contract on L1. For many DeFi use cases, some of them are unacceptably slow for trustless cross-chain operations; for these cases, you really need faster bridges with more imperfect security models. However, for the use case of updating wallet keys, longer delays are more acceptable: instead of delaying transactions for hours, you delay key changes. You just need to keep the old key around longer. If you change your key because it was stolen, you do have a vulnerability for a long time, but it can be mitigated, eg. Via a wallet with freeze functionality. Ultimately, the best latency-minimizing solution is to have the L2 optimally implement direct reads of the L1 state root, where each L2 block (or state root computation log) contains a pointer to the latest L1 block, so if L1 recovers, and L2 can recover as well. Keystore contracts should be placed on mainnet or on L2 of ZK Rollup so that they can be quickly committed to L1.

Blocks of the L2 chain can depend not only on previous L2 blocks, but also on L1 blocks. If L1 recovers over such a link, L2 will recover as well. It's worth noting that this is also how earlier (pre-Dank) versions of sharding were envisioned to work; see here for the code.

11. How much connection to Ethereum does another chain need to hold a keystore rooted in Ethereum or L2 wallet?

Surprisingly, not that many. It doesn't even need to be a rollup actually: if it's L3 or validation, it's fine to keep a wallet there, as long as you hold the keystore on an L1 or ZK rollup. What you really need is the chain to have direct access to the Ethereum state root, and a technical and social commitment to be willing to reorganize if Ethereum reorganizes, and to hard fork if Ethereum hard forks. An interesting research question is to determine the extent to which a chain can have this form of connection with multiple other chains (eg. Ethereum and Zcash). Doing this naively is possible: if Ethereum or Zcash reorganizes, your chain can agree to reorganize (and hard fork if Ethereum or Zcash hard forks), but your node operators and your community usually have two times technological and political dependencies. Therefore, this technique can be used to connect to some other chains, but at an increased cost. ZK bridge-based schemes have attractive technical properties, but their key weakness is that they are not robust to 51% attacks or hard forks. There may be smarter solutions as well.

12. Privacy protection

Ideally, we also want to preserve privacy. If you have many wallets managed by the same keystore, then we want to ensure that:

It is not clear that these wallets are all connected to each other.
Social Rehabilitation Guardians don't know what address they are protecting. This creates some problems:
We cannot use Merkle proofs directly because they do not protect privacy.
If we use KZG or SNARK, then the proof needs to provide a blinded version of the verification key without revealing the location of the verification key.
If we use aggregation, then the aggregator should not learn the positions in the plaintext; instead, the aggregator should receive blind proofs and have a way to aggregate these proofs.
We cannot use a "light version" (renewing keys using only cross-chain proofs), because it creates a privacy leak: if many wallets are updated at the same time due to the update process, time leaks information that may be relevant to these wallets. Therefore, we have to use the "heavy version" (cross-chain proof of each transaction). With SNARKs, the solution is conceptually simple: by default, proofs are information-hidden, and aggregators need to generate recursive SNARKs to prove SNARKs.

The main challenge with this approach currently is that aggregation requires the aggregator to create a recursive SNARK, which is currently very slow. With KZG, we can use this work (see also: a more formal version of this work in the Caulk paper) on non-indexed revealing KZG proofs as a starting point. However, aggregation of blind proofs is an open problem that requires more attention. Unfortunately, reading L1 directly from within L2 does not preserve privacy, although implementing direct read functionality is still very useful, both to minimize latency and because of its utility for other applications.

13. Summary

To have a cross-chain social recovery wallet, the most realistic workflow is a wallet that maintains a keystore in one location, and a wallet that maintains wallets in many locations, where the wallet reads the keystore to (i) update its authentication key locally view, or (ii) in the process of validating each transaction. A key element that makes this possible is cross-chain proofs. We need to work on optimizing these proofs. ZK-SNARKs or custom KZG solutions awaiting Verkle's proof seem to be the best options. In the long run, an aggregation protocol (where bundlers generate aggregated proofs as part of creating a bundle of all user actions submitted by users) is necessary to minimize costs. This should probably be integrated into the ERC-4337 ecosystem, although changes to ERC-4337 may be required. L2 should be optimized to minimize the latency of reading the L1 state (or at least the state root) from within L2. It is ideal for L2s to read L1 state directly, saving proof space. Wallets can not only be on L2; you can also put wallets on systems with a lower level of connectivity to Ethereum (L3, or even just agree to include the Ethereum state root and independent chain of forks). However, the keystore should be on L1 or high security ZK rollup L2. Using L1 saves a lot of complexity, but even that can be too expensive in the long run, so a keystore needs to be used on L2. Preserving privacy will require extra work and make certain choices more difficult. But we should probably turn to privacy-preserving solutions anyway, at least making sure that whatever we come up with is privacy-preserving compatible.

View Original

The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.

Reward
like
Comment
Share

Comment

0/400

No comments

Topic
#BTC#
228k posts
#PI#
195k posts
#ETH#
145k posts
4#GateioInto11#
79k posts
5#ContentStar#
66k posts
6#GT#
63k posts
7#BOME#
61k posts
8#DOGE#
57k posts
9#MAGA#
52k posts
10#SLERF#
51k posts

sitemap