All computing stacks need storage, otherwise it's nothing. As computing resources continue to increase, large amounts of underutilized storage space emerge. A Distributed Storage Network (DSN) is able to coordinate and leverage these potential resources, turning them into productive assets. These networks have the potential to introduce the first real business verticals to the Web 3 ecosystem.
P2P Development History
The emergence of Napster marks the entry of strictly P2P file sharing into the mainstream. There were other methods of file sharing before, but Napster's MP3 file sharing drove the popularity of P2P. Since then, distributed systems have developed rapidly. Because of the centralization of the Napster model (for indexing), it could easily be shut down by law and regulation, but it laid the groundwork for a more robust approach to file sharing.
The Gnutella protocol follows this line of thought, and there have been many different frontends utilizing the network in their own way. As a more distributed Napster-style query network, Gnutella is more resistant to censorship, which was also verified at the time. AOL acquired Nullsoft, which was on the rise, realized its power and canceled the launch decisively. However, the product has been leaked and quickly reverse engineered, resulting in familiar front-end applications such as Bearshare, Limewire and Frostwire. What ultimately caused such applications to fail were bandwidth requirements (a scarce resource at the time) and lack of liveness and content guarantees.
Do you have any impression? It doesn't matter if you don't, it has been reborn as an NFT market (
BitTorrent, which came next, was an upgrade, thanks to the protocol's bidirectional nature and its ability to maintain a distributed hash table (DHT). The importance of DHT is that it is similar to a distributed ledger, storing the location of files, which can be searched by other participating nodes in the network.
After the birth of Bitcoin and blockchain, people began to look forward to whether this new coordination mechanism could connect potential unused resources and commodity networks, and distributed storage networks (DSNs) began to sprout.
In fact, many people don't realize that the starting point of tokens and P2P networks is not Bitcoin and blockchain. The original founders of the P2P network have realized the following points:
Due to the existence of forks, it is very difficult to build a useful protocol and then realize it. Even if you use front-end advertising to realize monetization, you will be forked to drive down the price.
Usage varies widely. Taking Gnutella as an example, 70% of users do not share files, while 50% of requests focus on files hosted by the top 1% of hosts.
Power Law
How to solve these problems? BitTorrent started with the share rate (download/upload ratio), other protocols introduced primitive token systems. They often refer to tokens as credits or points, and distribute tokens to incentivize good behavior (promoting the health of the protocol) and maintenance of the network (such as curating content through credibility ratings). For some history on this, I highly recommend reading John Backus's (now deleted, but available via Internet Archive):
Fat Protocols Aren't New/ Fat Protocols are not new (
What If BitTorrent had a Token? (
It is worth mentioning that the distributed storage network (DSN) was included in the original vision of Ethereum, and it was called the "Holy Trinity", which aims to provide the necessary tools for the prosperity and development of the world's computers. Rumor has it that Gavin Wood came up with the concept of Swarm as the storage layer and Whisper as the messaging layer.
In short, the mainstream distributed storage network was born. Everyone knows what happened next.
Distributed storage network structure
The market structure of the distributed storage network is very interesting, and there is a gap between the scale of the head (Filecoin) and the emerging storage network. In the general impression, the storage field is dominated by two giants, Filecoin and Arweave, but in terms of usage, Arweave ranks fourth, behind Storj and Sia (although Sia usage seems to be declining). While we can question the authenticity of data on Filecoin, even with a 90% discount, Filecoin usage is still about 400 times that of Arweave.
What can be inferred from this?
First, there are dominant players in the market today, but the continuation of this dominance depends on the availability of storage resources.
These distributed storage networks (DSN) generally use the same architecture, and node operators have a large number of unused storage assets (hard disks), which they can use as collateral to mine blocks and earn mining rewards by storing data. Pricing and methods of achieving persistent storage can vary, with the most important differentiator being the ability for users to retrieve and process stored data easily and at a reasonable cost.
Comparison of storage network capacity and usage
Note:
The capacity of Arweave cannot be directly measured, however, its mechanism encourages node operators to ensure sufficient buffers to increase supply to meet demand. So how big is this buffer? Since it cannot be measured, we cannot be sure.
The actual network usage of Swarm cannot be determined, and the amount of storage space that has been paid can be seen, but it is not known whether it is used or not.
The projects in the table are all in continuous operation, and there are also some planned distributed storage networks (DSNs), such as ETH Storage, MaidSafe, etc.
FVM
Before discussing FVM, I have to mention Filecoin's recently launched FEVM (Filecoin Ethereum Virtual Machine). FEVM is a WASM virtual machine that adopts the hypervisor concept and supports many other runtimes. For example, FEVM is the Ethereum Virtual Machine runtime based on the FVM/FIL network. The reason why FEVM is worth highlighting is that it facilitated the explosion of smart contract (i.e. stuff) related activities on FIL. Before the launch of FEVM in March, there were basically only 11 active smart contracts on FIL, but after the launch of FVM, the number of smart contracts exploded. The benefits of composability are highlighted, and the work done in Solidity can be used to build new businesses on top of FIL, making various innovations possible, such as the quasi-liquid staking primitive developed by the GLIF team, and innovations in market financialization. We think FVM will accelerate storage provider growth due to increased capital efficiency (storage providers need FIL to actively offer storage/wrapped storage transactions). Unlike traditional LSD, the credit risk of individual storage providers needs to be assessed.
permanent storage
I believe Arweave has the biggest buzz here because its tagline hits at the deepest desire of Web 3 participants: permanent storage.
But what exactly does permanent storage mean? It's certainly a desirable attribute, but in reality, execution is everything. The key to execution is sustainability and cost to the end user. Arweave adopts a one-time payment for permanent storage model (200 years of early payment + assumption of diminishing storage value). This pricing model works for underlying assets in a deflationary pricing environment, relying on continuous goodwill accretion (i.e. old transactions subsidize new ones), but the opposite is true in an inflationary environment. There is no problem with this pricing model in history, because the cost of computer storage has maintained a downward trend since its appearance, but it is not comprehensive to only consider the cost of hard drives.
Arweave creates permanent storage through the incentive mechanism of the Succinct Proof of Random Access (SPoRA) algorithm, which encourages miners to store all data and prove that they can randomly generate historical blocks. Doing so increases the probability that a miner will be selected to create the next block (and be rewarded accordingly).
While this mechanism will make node operators want to store all data, it does not guarantee that they will. Even with high redundancy set and conservative heuristics used to determine model parameters, the potential risk of loss can never be ruled out.
The only way to achieve permanent storage is to explicitly force someone (or everyone) to implement it, and poor implementation will be eliminated. How can people be motivated to take on this responsibility? There is no problem with the heuristic method itself, but we still need to explore the best permanent storage implementation and pricing methods.
After some foreshadowing, we eventually ask what level of security is acceptable for permanent storage, and then consider pricing within a given time frame. In reality, consumer preferences will always fall within the replication spectrum (permanence), and they should be able to choose their level of security and be priced accordingly.
The benefits of diversification in reducing the overall risk of a portfolio are well documented in traditional investment literature and research. The initial diversification can reduce the risk of the portfolio, but over time, the benefit of adding another stock is almost zero.
In my opinion, in a distributed storage network, if the amount of replication is not proportional to the cost and security of storage, the storage pricing beyond the default number of replications should be similar to the curve in the figure,
For future developments, I am most looking forward to what opportunities a DSN with easily accessible smart contracts can bring to the permanent storage market. I think consumers will benefit if the market opens up different options for permanent storage.
We can think of the green area of the graph above as the realm of experimentation, where it may be possible to achieve exponential reductions in storage costs without drastically changing the level of replication and permanence.
Persistent storage can also be achieved by replicating across different storage networks, not just within a single network. This path is more ambitious, but naturally leads to different levels of persistent storage. The biggest question here is whether we can spread permanent storage all over the distributed storage network, just like diversifying market risks with stock portfolios, making permanent storage a free lunch.
Possibilities do exist, but there are node provider overlaps and other complications to consider. Forms of insurance could also be considered, such as making node operators assume higher levels of penalty conditions in exchange for guarantees. Such a system is also not easy to maintain, as multiple code bases are involved and require coordination. Nonetheless, we look forward to the proliferation of such designs, advancing the idea of permanent storage for the industry as a whole.
Web3's first commercial market
Matti recently tweeted that storage is a use case that brings real business value to Web3. I think it's possible.
I recently had a chat with a Layer 1 blockchain team, and I told them that as L1 managers, they have an obligation to fill up the block space, but more importantly, to achieve this through economic activity. This line often ignores the second part of its name, the currency part.
Any protocol that issues tokens needs to allow the tokens to support some form of economic activity if they want to avoid being shorted. For L1 protocols, their native tokens are used to process payments (perform calculations) and charge corresponding gas fees. The more economic activities, the more Gas, and the greater the demand for tokens. This is the cryptoeconomic model, and other protocols may choose to provide SaaS as an intermediary layer.
This is particularly effective when the cryptoeconomic model is combined with a specific commodity, which for Layer 1 protocols is computing. However, when it comes to financial transactions, the execution price fluctuation is a huge blow to the user experience. In financial transactions such as swaps, execution fees should be the least important part.
Relying on economic activity to fill block space is difficult given the poor user experience. While scaling solutions are emerging to help solve this problem (reading the Interplanetary Consensus White Paper (format) is strongly recommended), the Layer 1 market is flooded and it is not easy for any protocol to get enough economic activity.
And when computing power is combined with some kind of additional commodity, the problem becomes a little easier. In terms of distributed storage networks, the commodity is obviously storage space. Data storage and its derivative finance and securitization can immediately fill the gap in economic activity.
However, distributed storage also needs to provide effective solutions for traditional enterprises, especially enterprises that need to meet relevant regulations on data storage. This requires consideration of auditing standards, geographic restrictions, and optimizing the user experience.
We discussed Banyan in Part 2 of the interposer paper (their product is actually on the right track in this regard. They work with the node operators of the DSN, obtain SOC certification for the storage provided, and provide a simple user interface to optimize file uploads.
But this is not enough.
Stored content must also be easily accessible through an efficient retrieval marketplace. Zee Prime is optimistic about the prospect of building a content delivery network (CDN) on DSN. Basically, a CDN is a tool that caches content close to the user and reduces latency when retrieving it.
We think this is the next key to DSN adoption as it enables fast video loading (like distributed Netflix, YouTube, TikTok). Glitter in our portfolio is representative of this space and it focuses on DSN indexing. It is the critical infrastructure that enables more efficient search markets and richer use cases.
This type of product has shown a high degree of product market fit and has a lot of demand in Web 2. Still, many products face some friction, and the permissionless nature of Web 3 could be a boon for them.
Significance of composability
In fact, we believe that the great opportunity in the DSN field is just around the corner. In these two articles on Jnthnvctr.eth (he talks about how the market is developing and some of the upcoming products (using Filecoin as an example):
Filecoin status and direction (
Business model on FVM (
One of the most interesting ideas is the potential to combine storage and on-chain computing with off-chain computing. This is because providing storage resources itself requires computing power. This natural fit can increase commercial activity in the DSN while opening new use cases.
The introduction of FEVM has made many new experiments possible, and it has brought interest and competition to the storage field. Entrepreneurs wanting to create new products can go through the resource library (see all the products that Protocol Labs wants people to build, and may get bounties for building.
Web 2 made people discover data gravity, and those companies that collect/create large amounts of data can be rewarded, and they will lock it up to protect their own interests.
If our ideal user-controlled data solutions become mainstream, the value accumulation scenario will change. Users become the main beneficiaries, exchanging data for cash flow, and monetization tools that unlock this potential can benefit, and the way data is stored and accessed has also undergone tremendous changes. This type of data can naturally be stored on a DSN, which can monetize the data through a powerful query marketplace. This is the shift from exploitation to mobility.
There may be even more amazing developments waiting for us.
When imagining the future of distributed storage, consider how it will interact with future operating systems such as Urbit. Urbit is a personal server built using open source software that allows users to participate in a P2P network. It is a truly distributed operating system that can self-host and interact with the Internet in a P2P manner.
If the future is what Urbit followers hope it will be, distributed storage solutions will undoubtedly become a key component in the personal technology stack. Users can store all personal data encrypted on one DSN and coordinate actions through the Urbit operating system. In addition, we can expect further integration of distributed storage with Web 3 and Urbit, especially projects such as Uqbar Network (which can introduce smart contracts into the Nook environment.
This is the power of composability, and the slow accumulation will eventually bring exciting results. From small hustle to revolution, pointing to a way of being in a hyper-connected world. While Urbit may not be the final answer (and it has its critics), it shows us how these attempts can converge into a river that leads to the future.
View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
Distributed storage and business models in Web3
All computing stacks need storage, otherwise it's nothing. As computing resources continue to increase, large amounts of underutilized storage space emerge. A Distributed Storage Network (DSN) is able to coordinate and leverage these potential resources, turning them into productive assets. These networks have the potential to introduce the first real business verticals to the Web 3 ecosystem.
P2P Development History
The emergence of Napster marks the entry of strictly P2P file sharing into the mainstream. There were other methods of file sharing before, but Napster's MP3 file sharing drove the popularity of P2P. Since then, distributed systems have developed rapidly. Because of the centralization of the Napster model (for indexing), it could easily be shut down by law and regulation, but it laid the groundwork for a more robust approach to file sharing.
The Gnutella protocol follows this line of thought, and there have been many different frontends utilizing the network in their own way. As a more distributed Napster-style query network, Gnutella is more resistant to censorship, which was also verified at the time. AOL acquired Nullsoft, which was on the rise, realized its power and canceled the launch decisively. However, the product has been leaked and quickly reverse engineered, resulting in familiar front-end applications such as Bearshare, Limewire and Frostwire. What ultimately caused such applications to fail were bandwidth requirements (a scarce resource at the time) and lack of liveness and content guarantees.
Do you have any impression? It doesn't matter if you don't, it has been reborn as an NFT market (
BitTorrent, which came next, was an upgrade, thanks to the protocol's bidirectional nature and its ability to maintain a distributed hash table (DHT). The importance of DHT is that it is similar to a distributed ledger, storing the location of files, which can be searched by other participating nodes in the network.
After the birth of Bitcoin and blockchain, people began to look forward to whether this new coordination mechanism could connect potential unused resources and commodity networks, and distributed storage networks (DSNs) began to sprout.
In fact, many people don't realize that the starting point of tokens and P2P networks is not Bitcoin and blockchain. The original founders of the P2P network have realized the following points:
Power Law
How to solve these problems? BitTorrent started with the share rate (download/upload ratio), other protocols introduced primitive token systems. They often refer to tokens as credits or points, and distribute tokens to incentivize good behavior (promoting the health of the protocol) and maintenance of the network (such as curating content through credibility ratings). For some history on this, I highly recommend reading John Backus's (now deleted, but available via Internet Archive):
It is worth mentioning that the distributed storage network (DSN) was included in the original vision of Ethereum, and it was called the "Holy Trinity", which aims to provide the necessary tools for the prosperity and development of the world's computers. Rumor has it that Gavin Wood came up with the concept of Swarm as the storage layer and Whisper as the messaging layer.
In short, the mainstream distributed storage network was born. Everyone knows what happened next.
Distributed storage network structure
The market structure of the distributed storage network is very interesting, and there is a gap between the scale of the head (Filecoin) and the emerging storage network. In the general impression, the storage field is dominated by two giants, Filecoin and Arweave, but in terms of usage, Arweave ranks fourth, behind Storj and Sia (although Sia usage seems to be declining). While we can question the authenticity of data on Filecoin, even with a 90% discount, Filecoin usage is still about 400 times that of Arweave.
What can be inferred from this?
First, there are dominant players in the market today, but the continuation of this dominance depends on the availability of storage resources.
These distributed storage networks (DSN) generally use the same architecture, and node operators have a large number of unused storage assets (hard disks), which they can use as collateral to mine blocks and earn mining rewards by storing data. Pricing and methods of achieving persistent storage can vary, with the most important differentiator being the ability for users to retrieve and process stored data easily and at a reasonable cost.
Comparison of storage network capacity and usage
Note:
The capacity of Arweave cannot be directly measured, however, its mechanism encourages node operators to ensure sufficient buffers to increase supply to meet demand. So how big is this buffer? Since it cannot be measured, we cannot be sure.
The actual network usage of Swarm cannot be determined, and the amount of storage space that has been paid can be seen, but it is not known whether it is used or not.
The projects in the table are all in continuous operation, and there are also some planned distributed storage networks (DSNs), such as ETH Storage, MaidSafe, etc.
FVM
Before discussing FVM, I have to mention Filecoin's recently launched FEVM (Filecoin Ethereum Virtual Machine). FEVM is a WASM virtual machine that adopts the hypervisor concept and supports many other runtimes. For example, FEVM is the Ethereum Virtual Machine runtime based on the FVM/FIL network. The reason why FEVM is worth highlighting is that it facilitated the explosion of smart contract (i.e. stuff) related activities on FIL. Before the launch of FEVM in March, there were basically only 11 active smart contracts on FIL, but after the launch of FVM, the number of smart contracts exploded. The benefits of composability are highlighted, and the work done in Solidity can be used to build new businesses on top of FIL, making various innovations possible, such as the quasi-liquid staking primitive developed by the GLIF team, and innovations in market financialization. We think FVM will accelerate storage provider growth due to increased capital efficiency (storage providers need FIL to actively offer storage/wrapped storage transactions). Unlike traditional LSD, the credit risk of individual storage providers needs to be assessed.
permanent storage
I believe Arweave has the biggest buzz here because its tagline hits at the deepest desire of Web 3 participants: permanent storage.
But what exactly does permanent storage mean? It's certainly a desirable attribute, but in reality, execution is everything. The key to execution is sustainability and cost to the end user. Arweave adopts a one-time payment for permanent storage model (200 years of early payment + assumption of diminishing storage value). This pricing model works for underlying assets in a deflationary pricing environment, relying on continuous goodwill accretion (i.e. old transactions subsidize new ones), but the opposite is true in an inflationary environment. There is no problem with this pricing model in history, because the cost of computer storage has maintained a downward trend since its appearance, but it is not comprehensive to only consider the cost of hard drives.
Arweave creates permanent storage through the incentive mechanism of the Succinct Proof of Random Access (SPoRA) algorithm, which encourages miners to store all data and prove that they can randomly generate historical blocks. Doing so increases the probability that a miner will be selected to create the next block (and be rewarded accordingly).
While this mechanism will make node operators want to store all data, it does not guarantee that they will. Even with high redundancy set and conservative heuristics used to determine model parameters, the potential risk of loss can never be ruled out.
The only way to achieve permanent storage is to explicitly force someone (or everyone) to implement it, and poor implementation will be eliminated. How can people be motivated to take on this responsibility? There is no problem with the heuristic method itself, but we still need to explore the best permanent storage implementation and pricing methods.
After some foreshadowing, we eventually ask what level of security is acceptable for permanent storage, and then consider pricing within a given time frame. In reality, consumer preferences will always fall within the replication spectrum (permanence), and they should be able to choose their level of security and be priced accordingly.
The benefits of diversification in reducing the overall risk of a portfolio are well documented in traditional investment literature and research. The initial diversification can reduce the risk of the portfolio, but over time, the benefit of adding another stock is almost zero.
In my opinion, in a distributed storage network, if the amount of replication is not proportional to the cost and security of storage, the storage pricing beyond the default number of replications should be similar to the curve in the figure,
For future developments, I am most looking forward to what opportunities a DSN with easily accessible smart contracts can bring to the permanent storage market. I think consumers will benefit if the market opens up different options for permanent storage.
We can think of the green area of the graph above as the realm of experimentation, where it may be possible to achieve exponential reductions in storage costs without drastically changing the level of replication and permanence.
Persistent storage can also be achieved by replicating across different storage networks, not just within a single network. This path is more ambitious, but naturally leads to different levels of persistent storage. The biggest question here is whether we can spread permanent storage all over the distributed storage network, just like diversifying market risks with stock portfolios, making permanent storage a free lunch.
Possibilities do exist, but there are node provider overlaps and other complications to consider. Forms of insurance could also be considered, such as making node operators assume higher levels of penalty conditions in exchange for guarantees. Such a system is also not easy to maintain, as multiple code bases are involved and require coordination. Nonetheless, we look forward to the proliferation of such designs, advancing the idea of permanent storage for the industry as a whole.
Web3's first commercial market
Matti recently tweeted that storage is a use case that brings real business value to Web3. I think it's possible.
I recently had a chat with a Layer 1 blockchain team, and I told them that as L1 managers, they have an obligation to fill up the block space, but more importantly, to achieve this through economic activity. This line often ignores the second part of its name, the currency part.
Any protocol that issues tokens needs to allow the tokens to support some form of economic activity if they want to avoid being shorted. For L1 protocols, their native tokens are used to process payments (perform calculations) and charge corresponding gas fees. The more economic activities, the more Gas, and the greater the demand for tokens. This is the cryptoeconomic model, and other protocols may choose to provide SaaS as an intermediary layer.
This is particularly effective when the cryptoeconomic model is combined with a specific commodity, which for Layer 1 protocols is computing. However, when it comes to financial transactions, the execution price fluctuation is a huge blow to the user experience. In financial transactions such as swaps, execution fees should be the least important part.
Relying on economic activity to fill block space is difficult given the poor user experience. While scaling solutions are emerging to help solve this problem (reading the Interplanetary Consensus White Paper (format) is strongly recommended), the Layer 1 market is flooded and it is not easy for any protocol to get enough economic activity.
And when computing power is combined with some kind of additional commodity, the problem becomes a little easier. In terms of distributed storage networks, the commodity is obviously storage space. Data storage and its derivative finance and securitization can immediately fill the gap in economic activity.
However, distributed storage also needs to provide effective solutions for traditional enterprises, especially enterprises that need to meet relevant regulations on data storage. This requires consideration of auditing standards, geographic restrictions, and optimizing the user experience.
We discussed Banyan in Part 2 of the interposer paper (their product is actually on the right track in this regard. They work with the node operators of the DSN, obtain SOC certification for the storage provided, and provide a simple user interface to optimize file uploads.
But this is not enough.
Stored content must also be easily accessible through an efficient retrieval marketplace. Zee Prime is optimistic about the prospect of building a content delivery network (CDN) on DSN. Basically, a CDN is a tool that caches content close to the user and reduces latency when retrieving it.
We think this is the next key to DSN adoption as it enables fast video loading (like distributed Netflix, YouTube, TikTok). Glitter in our portfolio is representative of this space and it focuses on DSN indexing. It is the critical infrastructure that enables more efficient search markets and richer use cases.
This type of product has shown a high degree of product market fit and has a lot of demand in Web 2. Still, many products face some friction, and the permissionless nature of Web 3 could be a boon for them.
Significance of composability
In fact, we believe that the great opportunity in the DSN field is just around the corner. In these two articles on Jnthnvctr.eth (he talks about how the market is developing and some of the upcoming products (using Filecoin as an example):
One of the most interesting ideas is the potential to combine storage and on-chain computing with off-chain computing. This is because providing storage resources itself requires computing power. This natural fit can increase commercial activity in the DSN while opening new use cases.
The introduction of FEVM has made many new experiments possible, and it has brought interest and competition to the storage field. Entrepreneurs wanting to create new products can go through the resource library (see all the products that Protocol Labs wants people to build, and may get bounties for building.
Web 2 made people discover data gravity, and those companies that collect/create large amounts of data can be rewarded, and they will lock it up to protect their own interests.
If our ideal user-controlled data solutions become mainstream, the value accumulation scenario will change. Users become the main beneficiaries, exchanging data for cash flow, and monetization tools that unlock this potential can benefit, and the way data is stored and accessed has also undergone tremendous changes. This type of data can naturally be stored on a DSN, which can monetize the data through a powerful query marketplace. This is the shift from exploitation to mobility.
There may be even more amazing developments waiting for us.
When imagining the future of distributed storage, consider how it will interact with future operating systems such as Urbit. Urbit is a personal server built using open source software that allows users to participate in a P2P network. It is a truly distributed operating system that can self-host and interact with the Internet in a P2P manner.
If the future is what Urbit followers hope it will be, distributed storage solutions will undoubtedly become a key component in the personal technology stack. Users can store all personal data encrypted on one DSN and coordinate actions through the Urbit operating system. In addition, we can expect further integration of distributed storage with Web 3 and Urbit, especially projects such as Uqbar Network (which can introduce smart contracts into the Nook environment.
This is the power of composability, and the slow accumulation will eventually bring exciting results. From small hustle to revolution, pointing to a way of being in a hyper-connected world. While Urbit may not be the final answer (and it has its critics), it shows us how these attempts can converge into a river that leads to the future.