Kernel Ventures: An article on DA and historical data layer design

By Jerry Luo, Kernel Ventures

TL;DR

  1. In the early days, public chains required nodes across the network to maintain data consistency to ensure security and decentralization. However, with the development of the blockchain ecosystem, the storage pressure continues to increase, resulting in a trend of centralization of node operations. At this stage, Layer 1 urgently needs to solve the storage cost problem caused by the growth of TPS.

  2. In the face of this problem, developers need to propose a new historical data storage scheme on the premise of taking into account security, storage cost, data read speed and the versatility of the DA layer.

  3. In the process of solving this problem, many new technologies and ideas have emerged, including Sharding, DAS, Verkle Tree, DA intermediate components, etc. They tried to optimize the storage scheme of the DA layer by reducing data redundancy and improving the efficiency of data verification.

  4. The current DA scheme is roughly divided into two categories from the perspective of data storage location, namely the main chain DA and the third-party DA. The main chain DA is based on the perspective of regularly cleaning data and sharding data storage to reduce the storage pressure on nodes. Third-party DA design requirements are designed to serve storage and have a reasonable solution for large amounts of data. Therefore, it is mainly trade-off between single-chain compatibility and multi-chain compatibility, and three solutions are proposed: main-chain dedicated DA, modular DA, and storage public chain DA.

  5. The payment-based public chain has extremely high requirements for the security of historical data, and it is suitable to use the main chain as the DA layer. However, for public chains that have been running for a long time and a large number of miners are running the network, it is more appropriate to adopt a third-party DA that does not involve a consensus layer and takes into account security. The comprehensive public chain is more suitable for the use of DA storage dedicated to the main chain with larger data capacity, lower cost and security. But given the need for cross-chain, modular DA is also a good option.

  6. In general, blockchain is developing in the direction of reducing data redundancy and multi-chain division of labor.

1. background

As a distributed ledger, the blockchain needs to store historical data on all nodes to ensure the security and decentralization of data storage. Since the correctness of each state change is related to the previous state (the source of the transaction), in order to ensure the correctness of the transaction, a blockchain should, in principle, store all the history from the first transaction to the current transaction. Taking Ethereum as an example, even if the average size of each block is estimated at 20 kb, the total size of the current Ethereum block has reached 370 GB, and a full node has to record the state and transaction receipts in addition to the block itself. Counting this part, the total storage volume of a single node has exceeded 1 TB, which makes the operation of the node concentrated to a small number of people.

Ethereum's latest block height, image source: Etherscan

2. DA performance metrics

2.1 Security

Compared with the database or linked list storage structure, the immutability of blockchain comes from the fact that the newly generated data can be verified through historical data, so ensuring the security of its historical data is the first consideration in DA layer storage. For the evaluation of data security of the blockchain system, we often analyze the amount of data redundancy and the verification method of data availability

Number of redundancy: For the redundancy of data in the blockchain system, it can mainly play the following roles: first, if the number of redundancy in the network is larger, when the validator needs to check the account status in a historical block to verify the current transaction, it can get the largest number of samples for reference, and select the data recorded by the majority of nodes. In traditional databases, because data is only stored in the form of key-value pairs on a certain node, the attack cost is extremely low to change historical data only on a single node, and theoretically speaking, the more redundant the data, the more credible the data. At the same time, the more nodes are stored, the less likely the data is to be lost. This can also be compared to centralized servers that store Web2 games, and once all the backend servers are shut down, there will be a complete shutdown. However, more is not better, because each redundancy will bring additional storage space, and too much data redundancy will bring excessive storage pressure to the system, and a good DA layer should choose an appropriate redundancy method to strike a balance between security and storage efficiency.

Data availability check: The redundancy ensures that there are enough records of data in the network, but the data to be used also needs to be verified for accuracy and completeness. At this stage, the commonly used verification method in the blockchain is the cryptographic commitment algorithm, which retains a small cryptographic commitment for the whole network to record, and this commitment is obtained by mixing transaction data. To test the authenticity of a piece of historical data, it is necessary to restore the cryptographic promise through the data, check whether the cryptographic promise obtained by the restoration is consistent with the records of the whole network, and if it is consistent, the verification is passed. Commonly used password verification algorithms are Merkle Root and Verkle Root. The high-security data availability verification algorithm only requires very little verification data, and can quickly verify historical data.

2.2 Storage Costs

On the premise of ensuring basic security, the next core goal to be achieved in the DA layer is to reduce costs and increase efficiency. The first is to reduce storage costs, that is, to reduce the memory footprint caused by storing data per unit size, without considering the difference in hardware performance. At this stage, the main way to reduce storage costs in the blockchain is to adopt sharding technology and use rewarded storage to ensure that data is effectively stored, and reduce the number of data backups. However, it is not difficult to see from the above improvement methods that there is a game relationship between storage cost and data security, and reducing the storage occupation often means a decrease in security. Therefore, a good DA layer needs to balance storage cost with data security. In addition, if the DA layer is a separate public chain, it is also necessary to reduce the cost by minimizing the intermediate process of data exchange, and the index data needs to be left for subsequent query calls in each transit process, so the longer the call process, the more index data will be left and the storage cost will be increased. Finally, the cost of data storage is directly related to the durability of data. In general, the higher the storage cost of data, the more difficult it is for the public chain to persistently store data.

2.3 Data read speed

Once the cost reduction is achieved, the next step is efficiency gains, which is the ability to quickly call data out of the DA layer when it needs to be used. This process involves two steps, the first is to search for the nodes that store data, this process is mainly for the public chain that has not achieved the data consistency of the whole network, if the public chain has achieved the data synchronization of the nodes of the whole network, the time consumption of this process can be ignored. Secondly, in the mainstream blockchain systems at this stage, including Bitcoin, Ethereum, and Filecoin, the node storage method is Leveldb database. In Leveldb, data is stored in three ways. The first is that the data that is written on the fly is stored in a memtable type file, and when the memtable is full, the file type is changed from memtable to immutable memtable. Both types of files are stored in memory, but the Immutable Memtable file can no longer be changed and can only read data from it. The hot storage used in the IPFS network stores the data in this part, and it can be quickly read from the memory when it is called, but the mobile memory of an ordinary node is often at the gigabyte level, which is easy to write slowly, and when the node goes down and other abnormal conditions, the data in the memory will be permanently lost. If you want your data to be stored persistently, you need to store it as an SST file to a solid-state drive (SSD), but you need to read the data into memory first, which greatly slows down the data indexing speed. Finally, for systems with sharded storage, data restoration requires sending data requests to multiple nodes and restoring them, which will also slow down the data reading speed.

Leveldb data storage method, image source: Leveldb-handbook

2.4 DA Layer Commonality

With the development of DeFi and the problems of CEXs, the demand for cross-chain transactions of decentralized assets is also growing. Whether it is a cross-chain mechanism of hash locking, notary public or relay chain, it is inevitable to determine the historical data on the two chains at the same time. The crux of this problem lies in the separation of data on the two chains, and direct communication cannot be achieved in different decentralized systems. Therefore, at this stage, a solution is proposed by changing the storage mode of the DA layer, which stores the historical data of multiple public chains on the same trusted public chain, and only needs to call the data on this public chain when verifying. This requires the DA layer to be able to establish a secure communication method with different types of public chains, that is, the DA layer has good versatility.

3. DA related technology exploration

3.1 Sharding

  • In a traditional distributed system, a file is not stored in a complete form on a node, but the original data is divided into multiple blocks and a block is stored in each node. And blocks tend not to be stored on just one node, but to have appropriate backups on other nodes, which is usually set to 2 in existing mainstream distributed systems. This sharding mechanism can reduce the storage pressure on a single node, expand the total capacity of the system to the sum of the storage capacity of each node, and ensure the security of storage through proper data redundancy. The sharding approach taken in a blockchain is broadly similar, but there are differences in the specifics. First of all, because each node in the blockchain is untrustworthy by default, a large enough amount of data is needed to back up for subsequent data authenticity in the process of implementing Sharding, so the number of backups of this node needs to be much more than 2. Ideally, in a blockchain system with this storage scheme, if the total number of validators is T and the number of shards is N, then the number of backups should be T/N. The second is the storage procedure of Block, the traditional distributed system has fewer nodes, so it is often a node to adapt to multiple data blocks, the first is to map the data to the hash ring through the consistent hashing algorithm, and then each node stores a number of data blocks in a certain range, and it can be accepted that a node does not allocate a storage task in a certain storage. On the blockchain, whether each node is assigned to a block is no longer a random event but an inevitable event, and each node will randomly select a block for storage, which is completed by calculating the number of shards with the result of the data hash with the original data of the block and the node's own information. Assuming that each piece of data is divided into N blocks, the actual storage size of each node is only 1/N of the original size. By setting N appropriately, a balance between the growing TPS and the storage pressure of the node can be achieved.

How data is stored after Sharding, image source: Kernel Ventures

3.2 DAS(Data Availability Sampling)

DAS technology is based on the further optimization of Sharding in terms of storage methods. In the process of sharding, due to the simple random storage of nodes, a certain block may be lost. Secondly, for the sharded data, how to confirm the authenticity and integrity of the data during the restoration process is also very important. In DAS, both of these problems are addressed through Eraser code and KZG polynomial commitments.

Eraser code: Considering the huge number of validators on Ethereum, the probability that a block is not stored by any node is almost zero, but theoretically there is still the possibility of such an extreme situation occurring. In order to mitigate this possible threat of storage loss, instead of directly dividing the original data into blocks for storage, the original data is mapped to the coefficients of an nth-order polynomial, and then 2n points are taken on the polynomial, and the node randomly selects one of them for storage. For this nth-order polynomial, only n+1 points are needed to restore, so only half of the blocks need to be selected by the nodes to restore the original data. Through Eraser code, the security of data storage and the ability of the network to recover data are improved.

KZG Polynomial Promise: A very important part of data storage is the verification of data authenticity. In networks that do not use Eraser code, there are various ways to validate the process, but if the Eraser code above is introduced to improve data security, then it is more appropriate to use KZG polynomial commitments. KZG polynomial promises to directly verify the content of a single block in the form of polynomials, thus eliminating the need to restore polynomials to binary data, and the verification form is generally similar to that of Merkle Tree, but no specific Path node data is required, only KZG root and block data are needed to verify its authenticity.

3.3 DA layer data verification mode

Data validation ensures that the data called from the node has not been tampered with and has not been lost. In order to reduce the amount of data and the computational cost required in the verification process as much as possible, the DA layer currently adopts the tree structure as the mainstream verification method. The simplest form is to use the Merkle Tree for verification, which is recorded in the form of a full binary tree, and only needs to keep a Merkle root and the hash value of the subtree on the other side of the node path to be verified, and the time complexity of the verification is O(logN) level (logN defaults to log2(N) if the number is not based). Although the validation process has been greatly simplified, the amount of data in the validation process has generally increased with the increase in data. In order to solve the problem of increasing the amount of verification, another verification method, Verkle Tree, is proposed at this stage. In addition to storing value, each node in the Verkle Tree will also come with a Vector Commitment, through the value of the original node and this commitment proof, you can quickly verify the authenticity of the data, without calling the value of other sister nodes, which makes the number of calculations for each verification only related to the depth of the Verkle Tree, which is a fixed constant, thus greatly accelerating the verification speed. However, the computation of Vector Commitment requires the participation of all sister nodes in the same layer, which greatly increases the cost of writing and changing data. However, for historical data, which is permanently stored and cannot be tampered with, Verkle Tree is extremely suitable. In addition, there are also variants of the Merkle Tree and Verkle Tree in the form of K-ary, and their specific implementation mechanism is similar, but the number of subtrees under each node is changed, and the comparison of their specific performance can be seen in the following table.

Comparison of data verification methods and time performance, image source: Verkle Trees

3.4 Generic DA middleware

The continuous expansion of the blockchain ecology has brought about an increase in the number of public chains. Due to the advantages and irreplaceability of each public chain in their respective fields, it is almost impossible for the Layer1 public chain to become unified in a short period of time. However, with the development of DeFi and the problems of CEXs, the demand for decentralized cross-chain trading assets is also growing. As a result, DA-layer multi-chain data storage, which can eliminate security issues in cross-chain data exchanges, has received more and more attention. However, in order to accept historical data from different public chains, it is necessary for the DA layer to provide a decentralized protocol for standardized storage and verification of data flows, such as kvye, a storage middleware based on Arweave, which takes the initiative to capture data from the chain, and can store all data on the chain in a standard form to Arweave to minimize the differences in the data transmission process. Relatively speaking, Layer2, which specializes in providing DA layer data storage for a certain public chain, interacts with data through internal sharing nodes, which reduces the cost of interaction and improves security, but has relatively large limitations and can only provide services to specific public chains.

4. DA-tier storage scheme

4.1 Mainchain DA

4.1.1 class DankSharding

There is no definite name for this type of storage scheme, and the most prominent representative of this type of storage scheme is DankSharding on Ethereum, so the DankSharding-like scheme is used in this article. This type of solution mainly uses the two DA storage technologies mentioned above, Sharding and DAS. First, Sharding divides the data into appropriate parts, and then allows each node to extract a data block in the form of DAS for storage. If there are enough nodes on the whole network, we can take a larger number of shards N, so that the storage pressure of each node is only 1/N of the original, so as to achieve N times the overall storage space expansion. At the same time, in order to ensure that a block is not stored in any block in the extreme case, DankSharding encodes the data using Eraser Code, and only half of the data can be fully restored. Finally, the data validation process uses the structure of the Verkle tree and the polynomial commitment to achieve a fast validation.

4.1.2 Short-term storage

One of the simplest ways to process data for DA on the main chain is to store historical data for a short period of time. In essence, the blockchain plays the role of a public ledger, realizing changes to the content of the ledger under the premise of the whole network witnessing, without the need for permanent storage. Taking Solana as an example, although its historical data is synchronized to Arweave, the mainnet nodes only retain the transaction data for the last two days. On the public chain based on account records, the historical data of each moment retains the final state of the account on the blockchain, which is enough to provide a verification basis for the next moment of change. For projects with special data needs before this time period, they can store them on other decentralized public chains or by trusted third parties. This means that people who have additional data needs need to pay for historical data storage.

4.2 Third-Party DAs

4.2.1 Mainchain dedicated DA: EthStorage

DA for the main chain:D The most important thing in layer A is the security of data transmission, and the most secure in this regard is the DA of the main chain. However, main-chain storage is limited by storage space and competition for resources, so when the amount of network data grows rapidly, if you want to achieve long-term storage of data, third-party DA will be a better choice. If the third-party DA has higher compatibility with the mainnet, it can realize the sharing of nodes and have higher security in the process of data exchange. Therefore, under the premise of considering security, there will be huge advantages for the DA dedicated to the main chain. Taking Ethereum as an example, one of the basic requirements of the main chain dedicated DA is that it can be compatible with EVM to ensure interoperability with Ethereum data and contracts, and representative projects include Topia, EthStorage, etc. Among them, EthStorage is currently the most developed in terms of compatibility, because in addition to EVM-level compatibility, it also sets up relevant interfaces to connect with Ethereum development tools such as Remix and Hardhat to achieve compatibility at the Ethereum development tool level.

EthStorage: EthStorage is a public chain independent of Ethereum, but the nodes running on it are the superior of Ethereum nodes, that is, the nodes running EthStorage can also run Ethereum at the same time, and EthStorage can be directly operated through the opcode on Ethereum. In EthStorage's storage model, which only keeps a small amount of metadata on the Ethereum mainnet for indexing, essentially creates a decentralized database for Ethereum. In the current solution, EthStorage has implemented the interaction between the Ethereum mainnet and EthStorage by deploying an EthStorage contract on the Ethereum mainnet. If Ethereum wants to deposit data, it needs to call the put() function in the contract, and the input parameters are two byte variables key, data, where data represents the data to be deposited, and key is its identification in the Ethereum network, which can be seen as similar to the existence of CID in IPFS. After the (key, data) pair is successfully stored in the EthStorage network, EthStorage will generate a kvldx and return it to the Ethereum mainnet, which corresponds to the key on Ethereum, and this value corresponds to the storage address of the data on EthStorage, so that the problem of storing a large amount of data is now changed to store a single (key, kvldx) pair, which greatly reduces the storage cost of the Ethereum mainnet. If you need to make a call to the previously stored data, you need to use the get() function in EthStorage and enter the key parameter, and you can perform a quick lookup on the data on EthStorage through kvldx stored on Ethereum.

EthStorage contract, image source: Kernel Ventures

In terms of the way nodes store data, EthStorage borrows from Arweave's pattern. First of all, a large number of (k,v) pairs from ETH are sharded, and each sharding contains a fixed number of (k,v) data pairs, of which there is also a limit on the specific size of each (k,v) pair, so as to ensure the fairness of the workload size in the process of storing rewards for miners. For the issuance of rewards, you need to verify whether the node stores data. In this process, EthStorage divides a sharding (terabyte size) into a large number of chunks and keeps a Merkle root on the Ethereum mainnet for validation. Next, the miner needs to provide a nonce to generate the addresses of several chunks through a random algorithm with the hash of the previous block on EthStorage, and the miner needs to provide the data of these chunks to prove that it has indeed stored the entire sharding. However, this nonce cannot be selected arbitrarily, otherwise the node will select a suitable nonce that only corresponds to its stored chunk to pass the verification, so this nonce must make the generated chunk meet the network requirements after mixing and hashing, and only the first node to submit the nonce and random access proof can get the reward.

4.2.2 Modular DA: Celestia

Blockchain Module: At this stage, the transactions that need to be executed by the Layer1 public chain are mainly divided into the following four parts: (1) design the underlying logic of the network, select validators in a certain way, write blocks and distribute rewards to network maintainers, (2) package and process transactions and publish related transactions, (3) verify the transactions to be put on the chain and determine the final state, and (4) store and maintain historical data on the blockchain. Depending on the functions accomplished, we can divide the blockchain into four modules, namely the consensus layer, the execution layer, the settlement layer, and the data availability layer (DA layer).

Modular blockchain design: For a long time, these four modules have been integrated into a public chain, and such a blockchain is called a monolithic blockchain. This form is more stable and easy to maintain, but it also puts a lot of pressure on a single public chain. In practice, these four modules constrain each other and compete for the limited computing and storage resources of the public chain. For example, increasing the processing speed of the processing layer will put more storage pressure on the data availability layer, and ensuring the security of the execution layer will require more complex authentication mechanisms that slow down transaction processing. Therefore, the development of public chains often faces trade-offs between these four modules. In order to break through the bottleneck of improving the performance of this public chain, the developers proposed a modular blockchain scheme. The core idea of modular blockchain is to separate one or several of the above four modules and hand them over to a separate public chain implementation. In this way, on the public chain, you can only focus on the improvement of transaction speed or storage capacity, and break through the previous limitations caused by the short-board effect on the overall performance of the blockchain.

Modular DA: The complex approach of detaching the DA layer from the blockchain business and handing it over to a single public chain is considered a viable solution for the growing historical data of Layer 1. Exploration in this area is still in its early stages, with Celestia being the most representative project at the moment. In terms of the specific storage method, Celestia borrows from Danksharding's storage method, which is to divide the data into multiple blocks, extract a part of it by each node for storage, and verify the data integrity with the KZG polynomial commitment. At the same time, Celestia uses advanced 2D RS erasure coding to rewrite the original data in the form of a kk matrix, and finally only 25% of the original data can be recovered. However, data sharding storage essentially only multiplies the storage pressure of nodes on the whole network by a factor on the total data volume, and the storage pressure and data volume of nodes still maintain linear growth. As Layer 1 continues to improve transaction speed, the storage pressure of nodes may still reach an unacceptable threshold one day. To solve this problem, the IPLD component was introduced in Celestia for processing. For the data in the kk matrix, it is not stored directly on Celestia, but in the LL-IPFS network, and only the CID code of that data on IPFS is kept in the node. When a user requests a piece of historical data, the node sends the corresponding CID to the IPLD component, and uses the CID to call the raw data on IPFS. If data exists on IPFS, it is returned through IPLD components and nodes, and if it is not, it cannot be returned.

How Celestia data is read, image source: Celestia Core

Celestia: Taking Celestia as an example, we can get a glimpse of the application of modular blockchain in solving Ethereum's storage problem. The Rollup node will send the packaged and verified transaction data to Celestia and store the data on Celestia, in this process, Celestia only stores the data without too much awareness, and finally according to the size of the storage space, the Rollup node will pay the corresponding tia tokens to Celestia as storage fees. The storage in Celstia leverages DAS and erasure coding similar to that in EIP4844, but the polynomial erasure coding in EIP4844 is upgraded to 2D RS erasure coding and the storage security is upgraded again, requiring only 25% fractures to restore the entire transaction data. Essentially, it is just a low-cost POS public chain, and if you want to solve Ethereum's historical data storage problem, you need many other specific modules to work with Celestia. For example, in terms of rollups, one of the most recommended rollup modes on Celestia's official website is Sovereign Rollup. Different from the common rollups on Layer 2, only the transaction is calculated and verified, that is, the operation of the execution layer is completed. Sovereign Rollup encompasses the entire execution and settlement process, which minimizes the processing of transactions on Celestia, which can maximize the security of the overall transaction process when Celestia's overall security is weaker than Ethereum's. In terms of ensuring the security of the data called by Celestia on the Ethereum mainnet, the most mainstream solution is the quantum gravity bridge smart contract. For data stored on Celestia, it generates a Merkle Root (Proof of Data Availability) and remains on the quantum gravitational bridge contract on the Ethereum mainnet, and every time Ethereum calls historical data on Celestia, it compares its hash result with Merkle Root, and if it does, it means that it is indeed true historical data.

4.2.3 Store public chain DA

In terms of the principle of DA technology of the main chain, many technologies similar to Sharding are borrowed from the storage public chain. Among the third-party DAs, some of them have completed some storage tasks directly with the help of the storage public chain, such as the specific transaction data in Celestia is placed on the LL-IPFS network. In the third-party DA solution, in addition to building a separate public chain to solve the storage problem of Layer 1, a more direct way is to directly connect the storage public chain with Layer 1 to store the huge historical data on Layer 1. For high-performance blockchains, the volume of historical data is even larger, and the data size of the high-performance public chain Solana is close to 4 PG when running at full speed, which is completely beyond the storage range of ordinary nodes. Solana's chosen solution was to store historical data on Arweave, a decentralized storage network, and only keep 2 days of data on nodes on mainnet for verification. In order to ensure the security of the stored process, Solana and the Arweave chain have designed a storage bridge protocol, Solar Bridge. The data verified by the Solana node is synced to Arweave and the corresponding tag is returned. With this tag, Solana nodes can view the historical data of the Solana blockchain at any time. On Arweave, it is not necessary for nodes across the network to maintain data consistency and use this as a threshold to participate in the operation of the network, but instead adopts the method of reward storage. First of all, Arweave doesn't use a traditional chain structure to build blocks, but more like a graph structure. In Arweave, a new block points not only to the previous block, but also to a randomly generated Recall Block. The exact location of a Recall Block is determined by the hash of its previous block and its block height, and the location of the Recall Block is unknown until the previous block is mined. However, in the process of generating new blocks, nodes are required to have Recall Block's data to use the POW mechanism to calculate the hash of the specified difficulty, and only the miners who first calculate the hash that matches the difficulty can be rewarded, encouraging miners to store as much historical data as possible. At the same time, the fewer people who store a historical block, the fewer competitors the node will have when generating a difficulty nonce, encouraging miners to store blocks with fewer backups in the network. Finally, in order to ensure that nodes can permanently store data in Arweave, WildFire's node scoring mechanism is introduced. Nodes tend to communicate with nodes that can provide more historical data more quickly, while nodes with lower ratings often do not have access to the latest block and transaction data in the first place, so they cannot take the lead in the competition for POW.

How Arweave blocks are built, image source: Arweave Yellow-Paper

5. Comprehensive comparison

Next, we will compare the pros and cons of each of the five storage scenarios based on the four dimensions of DA performance metrics.

Security: The biggest source of data security problems is the loss caused by data transmission and malicious tampering from dishonest nodes, and in the cross-chain process, due to the independence and state of the two public chains are not shared, it is the hardest hit area of data transmission security. In addition, Layer 1 that requires a dedicated DA layer at this stage often has a strong consensus group, and its own security will be much higher than that of ordinary storage public chains. Therefore, the scheme of the main chain DA has higher security. After ensuring the security of data transmission, the next step is to ensure the security of the call data. If only the short-term historical data used to verify transactions is considered, the same data is backed up by the whole network in the temporarily stored network, while the average number of data backups in the DankSharding-like scheme is only 1/N of the number of nodes in the whole network, more data redundancy can make the data less likely to be lost, and can also provide more reference samples for verification. Therefore, temporary storage will have higher data security. In the third-party DA scheme, the main-chain dedicated DA uses common nodes with the main chain, and data can be directly transmitted through these relay nodes during the cross-chain process, so it will also have relatively higher security than other DA solutions.

Storage cost: The biggest contributor to storage costs is the amount of redundancy of data. In the short-term storage solution of the main chain DA, the data synchronization of the nodes of the whole network is used for storage, and any newly stored data needs to be backed up by the nodes of the whole network, which has the highest storage cost. The high cost of storage in turn determines that this method is only suitable for temporary storage in high-TPS networks. The second is the storage method of Sharding, including Sharding in the main chain and Sharding in third-party DAs. Since the main chain tends to have more nodes, there will be more backups for each block, so the main chain sharding solution will have a higher cost. The lowest storage cost is the storage public chain DA that adopts the reward storage method, and the amount of data redundancy in this scheme often fluctuates around a fixed constant. At the same time, a dynamic adjustment mechanism has also been introduced in the storage public chain DA to attract nodes to store less backup data by increasing rewards to ensure data security.

Data read speed: The storage speed of data is mainly affected by the storage location of the data in the storage space, the data index path, and the distribution of the data in the nodes. Among them, where the data is stored on the node has a greater impact on the speed, because storing the data in memory or SSD can cause the read speed to vary by tens of times. The storage public chain DA mostly adopts SSD storage, because the load on the chain includes not only the data of the DA layer, but also the personal data with high memory occupation such as videos and pictures uploaded by users. If the network does not use SSDs as storage space, it is difficult to withstand the huge storage pressure and meet the needs of long-term storage. Second, for third-party DAs and main-chain DAs that use in-memory storage data, the third-party DA first needs to search for the corresponding index data in the main chain, and then transfer the index data to the third-party DA across the chain and return the data through the storage bridge. In contrast, mainchain DAs can query data directly from nodes and therefore have faster data retrieval speeds. Finally, inside the main chain DA, the Sharding method needs to call the block from multiple nodes and restore the original data. As a result, short-term storage is slower than short-term storage without sharding.

DA layer universality: The DA universality of the main chain is close to zero, because it is impossible to transfer data from a public chain with insufficient storage space to another public chain with insufficient storage space. In third-party DAs, the versatility of the solution and its compatibility with a particular main chain are a pair of contradictory indicators. For example, in a mainchain-specific DA scheme designed for a certain main chain, a large number of improvements have been made at the node type and network consensus level to adapt to the public chain, so these improvements can be a huge hindrance when communicating with other public chains. However, within the third-party DA, compared with the modular DA, the storage public chain DA performs better in terms of versatility. The storage public chain DA has a larger developer community and more expansion facilities, which can adapt to the situation of different public chains. At the same time, the storage public chain DA obtains data more actively through packet capture, rather than passively receiving information transmitted from other public chains. Therefore, it can encode data in its own way, realize the standardized storage of data flows, facilitate the management of data information from different main chains, and improve storage efficiency.

Comparison of storage solution performance, image source: Kernel Ventures

6. summary

Blockchains at this stage are undergoing a transition from Crypto to a more inclusive Web3, bringing more than just a wealth of projects on the blockchain. In order to accommodate so many projects running at the same time on Layer 1, while ensuring the experience of Gamefi and Socialfi projects, Layer 1, represented by Ethereum, has adopted methods such as Rollups and Blobs to improve TPS. Among the nascent blockchains, the number of high-performance blockchains is also growing. But higher TPS means not only higher performance, but also greater storage pressure on the network. For the massive historical data, a variety of DA methods based on the main chain and third parties are proposed at this stage to adapt to the growth of on-chain storage pressure. There are pros and cons to each improvement method, and it has different applicability in different contexts.

Payment-based blockchains have extremely high requirements for the security of historical data, and do not pursue particularly high TPS. If this kind of public chain is still in the preparatory stage, you can adopt a DankSharding-like storage method, which can achieve a huge increase in storage capacity while ensuring security. However, if it is a public chain like Bitcoin, which has been formed and has a large number of nodes, there is a huge risk in making hasty improvements at the consensus layer, so it is possible to adopt a dedicated DA for the main chain with high security in off-chain storage to take into account security and storage issues. But it's worth noting that the functionality of blockchain is not static but ever-changing. For example, in the early days, Ethereum's functions were mainly limited to payments and the use of smart contracts to simply automate assets and transactions, but with the continuous expansion of the blockchain territory, various Socialfi and Defi projects have gradually been added to Ethereum, making Ethereum develop in a more comprehensive direction. Recently, with the outbreak of the inscription ecology on Bitcoin, the transaction fee of the Bitcoin network has surged nearly 20 times since August, reflecting that the transaction speed of the Bitcoin network at this stage cannot meet the transaction demand, and traders can only increase the transaction fee so that the transaction can be processed as soon as possible. Now, the Bitcoin community needs to make a trade-off, whether to accept high fees and slow transaction speeds, or to reduce network security to increase transaction speed but go against the original purpose of the payment system. If the Bitcoin community chooses the latter, then the corresponding storage scheme will also need to be adjusted in the face of increasing data pressure.

Bitcoin mainnet transaction fees fluctuate, image source: OKLINK

For the public chain with comprehensive functions, it has a higher pursuit of TPS, and the growth of historical data is even greater, and it is difficult to adapt to the rapid growth of TPS in the long run by adopting a DankSharding-like solution. Therefore, it is more appropriate to migrate the data to a third-party DA for storage. Among them, the DA dedicated to the main chain has the highest compatibility, and may be more advantageous if only the storage problem of a single public chain is considered. However, in today's Layer1 public chain, cross-chain asset transfer and data interaction have also become a common pursuit of the blockchain community. If we consider the long-term development of the entire blockchain ecosystem, storing the historical data of different public chains on the same public chain can eliminate many security problems in the data exchange and verification process, so the way of modular DA and storing public chain DA may be a better choice. Under the premise of close universality, modular DA focuses on providing services of the DA layer of the blockchain, and introduces more refined index data management historical data, which can make a reasonable classification of data of different public chains, and has more advantages compared with storing public chains. However, the above scheme does not take into account the cost of consensus layer adjustment on the existing public chain, which is extremely risky, and once there is a problem, it may lead to systemic vulnerabilities and make the public chain lose community consensus. Therefore, if it is a transitional solution in the process of blockchain scaling, the simplest temporary storage of the main chain may be more suitable. Finally, the above discussions are based on the performance in the actual operation process, but if the goal of a public chain is to develop its own ecology and attract more project parties and participants, it may also prefer projects that are supported and funded by its own foundation. For example, in the case of the same or even slightly lower overall performance than the storage public chain storage scheme, the Ethereum community will also prefer EthStorage as a Layer2 project supported by the Ethereum Foundation to continue to develop the Ethereum ecosystem.

All in all, the increasing complexity of today's blockchains also brings with them greater storage space requirements. If there are enough Layer 1 validators, the historical data does not need to be backed up by all nodes in the whole network, and only needs to be backed up to a certain number to ensure relative security. At the same time, the division of labor of public chains is becoming more and more detailed, with Layer 1 responsible for consensus and execution, Rollup responsible for calculation and verification, and then using a separate blockchain for data storage. Each part can focus on one function without being limited by the performance of the others. However, how much or what percentage of nodes to store historical data to achieve a balance between security and efficiency, and how to ensure the secure interoperability between different blockchains, is a question that blockchain developers need to think about and constantly improve. For investors, they can pay attention to the main chain dedicated DA project on Ethereum, because Ethereum already has enough supporters at this stage that it does not need to rely on other communities to expand its influence. More needs are to improve and develop their own communities and attract more projects to land in the Ethereum ecosystem. However, for public chains in the position of chasers, such as Solana and Aptos, the single chain itself does not have such a complete ecosystem, so it may be more inclined to unite the forces of other communities to build a huge cross-chain ecosystem to expand its influence. Therefore, for the emerging Layer 1, generic third-party DAs deserve more attention.

Source: Golden Finance

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)