Understanding Bacalhau 1.0 in one article: Unlocking the potential of private data

*This article is based on a presentation by Simon Worthington at the Boston Summit in May 2023. *

Bacalhau revolutionizes the data processing landscape by enabling data-native computation: sending code to run analytics where the data resides, rather than moving data onto code. By preserving data and allowing it to be authorized, audited and controlled for computation, more data can be used while reducing the risk of misuse, which is the answer to the problem of data governance. Data volumes are growing 45% faster than network bandwidth, 57% of data is stored outside of the cloud or traditional data centers, and moving data is too slow and costly for any organization operating at scale.

There is another good reason to keep data locally: control. Whether through mandatory regulations like the Health Insurance Portability and Accountability Act (HIPAA) or the General Data Protection Regulation (GDPR), or native protections for sensitive financial or corporate secrets, almost 100% of all data is under some form of governance. Moving data into computing takes it out of its usual safe zone and increases its risk of misuse.

Understanding Bacalhau 1.0 in one article: Unleashing the potential of private data

Most data are not strictly open or closed, but exist within a certain range. Within this scope, specific persons can be granted access for specific purposes.

Source: The ODI

Since 2008, global data governance fines have totaled nearly $250 billion. It's no surprise, then, that most businesses fear data sharing, resulting in 68% of corporate data being untapped. In fact, most controlled data could in principle be shared and used for more effective decision-making — but only with the right people and for the right purpose.

Data sharing requires technical enforcement

Most organizations attempt to meet this need through strict data-sharing agreements or contracts. These protocols are costly and time-consuming to set up—for businesses like national governments or financial institutions, it can take months to go through data governance to enable data sharing between internal teams.

Worse, these agreements simply don't work — most data-sharing agreements are completely unenforceable and serve only to provide a false sense of security. Once the data crosses the trust boundary, only soft mechanisms (such as trusting everyone to abide by the agreement) can prevent abuse. The actual operation of sharing data is invisible to everyone, and it is difficult to supervise.

“Contracts or agreements between data providers and data users often prove to be ineffective.

In the Cambridge Analytica scandal, contract terms were completely ignored and personal data misused.

The lack of any strong technical evidence could deny courtrooms access to valid information and make it difficult for regulators, politicians, journalists and the public to understand what happened. "

——Putting the trust in data trusts, Register Dynamics, 2019

Clearly, what is needed is a new way to reuse data across trust boundaries: one that gives analysts simple, controlled access to data without risking regulatory fines and headlines to data owners.

Bacalhau makes data sharing visible and auditable

At Bacalhau, we believe that data-native computing is the answer to data governance challenges. By preserving data and allowing it to be authorized, audited and controlled for computation, more data can be used while reducing the risk of misuse.

What's more, since Bacalhau is a distributed computing platform, there is no need to move data to central storage. Data can live wherever it is supposed to in the organization, avoiding difficult organizational changes and taking away any control from data owners.

We are proud to announce that as part of Bacalhau 1.0, we have added job and data governance capabilities. With Bacalhau, data owners can control who, what, where, why and how computations are performed on their private data.

Bacalhau control code and output

Bacalhau uses a two-step approach to job control. First, data owners have the opportunity to check that jobs comply with their policies. This pre-governance phase occurs before a job starts running and allows governance to approve or deny computations based on the data that will be used, who is requesting the job, and the code that will be executed against the job.

While humans are always in control, not every decision needs to be made by a human. The pre-governance process is flexible and can be automated as needed. Data owners can set policies, deeply inspect upcoming calculations, set different policies for different people, and invoke complex algorithms that analyze security and risk. When a job is not suitable for automatic control, a human can make the final decision.

Understanding Bacalhau 1.0 in one article: Unleashing the potential of private data

Bacalhau provides two gateways for computation - one before computation and one after computation.

Once approved, Bacalhau sends the job to the appropriate executor, which only has access to the requested data and is securely isolated from the host system. Bacalhau imposes resource constraints on jobs to control processing power and memory usage.

While pre-control provides a reasonable first line of defense of trust, generally speaking, deciding what a computer program will do without running it is difficult and requires technical skills. We have learned from the experience that the ONS and other related controlled research environments have been securely allowing controlled access to data for decades and borrowed from their practices in the digital realm. Thus, in addition to pre-execution controls, Bacalhau also allows modification of results after execution before they are released to task submitters.

When Bacalhau completes its calculations, it saves the results to a private pre-release field. Administrators then use the job's background check results to determine whether those results are expected for the job. Results can be downloaded if the administrator deems the content suitable for sharing. What's more, access to the private storage area is strictly locked down, and users can only stream results for their own jobs via Bacalhau's download feature.

As with pre-controls, a whole complex set of analyzes can be performed on the results. With Amplify technology, data owners can automatically detect personally identifiable information (PII), summarize tabular data such as CSVs, and analyze content in images and video clips. The generated metadata can be used both to automatically publish results and to provide valuable information for human decision-making.

Control to open a new joint learning

Computing on data separated by trust boundaries enables massive data sharing, but there is currently no secure technical solution. Organizations can now apply Bacalhau job moderation and open data access without the need for complex data governance if the data held by an organization is shared more broadly to generate shared value.

For example, a university could make more data available to citizen scientists or outside researchers, one government department could allow another to analyze its data, or one team at a highly regulated financial institution could allow another to deeply analyze its data. In summary, it is important not to release raw data to less trustworthy users. Bacalhau ensures that users get their analysis results and nothing more.

Understanding Bacalhau 1.0 in one article: unleashing the potential of private data

The same distributed controlled computing model also enables federated learning among participants in different organizations. With Bacalhau, independent organizations can conduct in-depth analysis from aggregated data without sharing the data. With federated learning techniques, data scientists can now train machine learning or AI models on the datasets of many different independent or even competing organizations without giving those organizations away control of the data and accurate visibility into data usage.

For example, central government agencies responsible for formulating macroeconomic policies can use data held by local organizations. Likewise, industry bodies such as insurance regulators can train models by submitting federated learning Bacalhau jobs to all of their member insurance companies.

Centralizing data in one place could lead to the sale or misuse of this valuable aggregated data; but keeping data locally allows each insurer to be sure its data is being used only for mutually agreed purposes of mutual benefit.

Compute islands for specific theme analysis

Finally, the fine-grained control over job execution provided by Bacalhau now enables administrators to be the gateway into computing islands. In this structure, independent computing providers and data owners interested in providing resources for specific purposes can delegate job authorization to trusted controllers.

Understanding Bacalhau 1.0 in one article: unleashing the potential of private data

For example, scientists collaborating to collect medical data that could help treat cancer can provide data and computation through external curators they trust. The controller only accepts jobs that comply with agreed policies—in this case, jobs that contribute to new treatments for cancer.

In this way, scientists can focus on larger public good goals by delegating external access requests to controllers. With Bacalhau's robust audit log, scientists can later verify that controllers acted according to agreed policies.

Bacalhau is the future of data sharing

We're excited to release job and data governance capabilities in Bacalhau 1.0! We believe that data computing represents a new way of thinking about data sharing—in short, keeping data safe by not sharing it!

Today, we’re working with companies and government agencies that recognize the potential of governed computing across trust boundaries. If you'd like to learn more about how these features can work for you, join the Bacalhau Slack or get in touch with us directly.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)