முக்கிய உள்ளடக்கத்திற்குச் செல்லவும்

DeSci, independent labs, and large-scale data science

Juan Benet on how the decentralized science (DeSci) movement can fund, organize, and open science using Web3 tools, covering funding mechanisms, open access, reproducible experiments, and large-scale data science pipelines.

Date published: 30 ஜூன், 2022

A presentation by Juan Benet, founder of Protocol Labs and inventor of IPFS and Filecoin, at EthCC on how the DeSci movement can use Web3 tools to fund science, organize researchers, and build open access and reproducible research infrastructure.

This transcript is an accessible copy of the original video transcript (opens in a new tab) published by EthCC. It has been lightly edited for readability.

Introduction to science and progress (0:10)

All right, hello everyone. My name is Juan. I'm here to talk about DeSci. I want to talk about how we can use DeSci to fund, organize, and open science. First off, this is what we're going to be talking through: I'm going to talk about science in general for a moment, then I'm going to talk about what the DeSci movement is, then about how we can fund the science commons. Then how DeSci is organizing the people, the projects, and the works around science. Then I want to speak a little bit about open access and reproducible science, and I want to finish with a call to action. So this will be pretty quick. We have a lot to cover, so I'll move fast.

First off, I want to start by saying that there's been an enormous amount of progress in the last few centuries. Almost every human metric has been improving. By almost any measure that we can think of, the human condition has been improving dramatically, and a big part of achieving this progress has been the scientific enterprise. By extending what we know, by being able to transform what we know into technologies and solutions to various problems, we have been able to lift a huge fraction of the world out of poverty. We've been able to feed tons of people, give shelter to everybody, cure all kinds of diseases, and so on. An enormous amount of progress has been achieved thanks to science.

Science is a massive enterprise with lots of different subfields and many different areas of knowledge. You can think of any particular field and any area of study, and science is a big part of it. At the end of the day, what science is about is the process of finding things out. The process of creating new knowledge and coupling new concepts. Think of the scientific method. There's a famous quote from Feynman: "If it disagrees with experiment, it's wrong." And that is the key to science.

You can think of science as a large-scale enterprise that involves humans around the planet. There are all kinds of efforts and systems. You think of everything from various universities on the planet, various research groups, different fields, and journals. There is lots of different activity around the process of synthesizing what we know, coming up with new ideas, transforming those ideas into research projects, turning those into actual testing of hypotheses, and gathering data to be able to test whether a hypothesis is correct. All the way through writing up those results into some kind of paper that then gets reviewed by a scientific community, gets added to the tree of knowledge, and then extends what we know.

Maybe the story stops there, or maybe later it turns out that, actually, that wasn't reproducible, and we have to unwind that. Or actually, that was correct, but it opened up the door to tons of other new knowledge. So it's a highly dynamic field with lots of different activity.

Now, science has a ton of problems. There are all kinds of issues with the scientific enterprise. Even though it has been an enormous engine for progress, there are all kinds of things that have been going wrong with it. In particular, there's a lack of funding across a variety of fields. At the same time, even though there's a lack of funding, there is a lot of money overall going to science. There's a feeling that the money is not going as far as it used to, that science is not getting as much for its buck anymore. There are many fields across the board that are way too competitive in terms of getting grants.

Once the studies are done and produced, only a fraction of those replicates. So there's all kinds of science that has been published and accepted and thought to be correct, only to find out later that a huge fraction of it can't actually be reproduced. So there's a huge reproducibility crisis. And there are even scientific discoveries whose artifacts are going missing. Think of the actual papers, code, or data associated with some result going missing from our knowledge banks. So there are all kinds of issues around science that need to be fixed, and this is part of what DeSci is about. This is tackling a range of these problems, not wholesale, not completely, but the DeSci community is trying to tackle a number of these issues.

The DeSci movement (5:11)

So what is DeSci? DeSci is a movement to improve science using Web3 tech and tools. Think of being able to use all of the magic of hash linking, blockchains, and smart contracts to create systems and structures that can improve how we do science across fields around the globe.

There are a bunch of different focus areas. Think of being able to have open access papers and data commons, having better reproducible experiments, and being able to organize labs and groups better. Think of creating structures like DAOs that can enable research groups to form and organize, raise capital, and distribute rewards to participants. There are entirely new funding structures, things like IPNFTs. There are protocols for peer review with rewards. Historically, peer review has been this predatory situation where academics put in an enormous amount of time and effort to peer review all of the work, and journals don't actually pay anyone for that labor. There are all kinds of new incentive structures being experimented with.

This is a fairly new movement. It has been with us for a while. When I started IPFS, it was kind of a DeSci movement before DeSci was a thing. I started IPFS with the goal of enabling people to distribute data much better for the purpose of doing science. So a lot of these ideas are part of the core of the project. However, the movement has been picking up a lot of steam over the last year or two, and a lot of new organizations have been appearing. This map has doubled or tripled in size in the last year, which is really great to see.

There are now several groups doing decentralized biotech funding, groups like VitaDAO, Molecule, and others. There are many organizations trying to come up with new structures for funding science. There are several DAOs that are scientific organizations themselves trying to do R&D. There are several foundations and institutions that are supporting a lot of the DeSci work, or that associate themselves with DeSci in one way or another. There are many groups exploring different ways of publishing, many science NFTs, and so on. This community has been growing a lot over the last year or two.

There are also now a lot of different meetups and conferences that are gathering these communities. Things like DeSci Day, DeSci Berlin, Schelling Point from the Gitcoin community, and Funding the Commons. These conferences are gathering a lot of the conversations around DeSci.

Funding the commons (10:40)

Let's talk about funding the commons. Maybe some of you have seen this diagram I've used in the past around the innovation chasm. In the science-to-technology translation, the DeSci part is mostly focused on the left part—just the science part—trying to think of better incentive structures and better ways of coordinating groups to produce better scientific output. It's worth noting that the total global R&D funding is, from one perspective, kind of massive, but from another perspective, not that large and hasn't changed that much in the last few decades, even though the throughput and outcome of the technology we're building has grown tremendously.

These scales of funding are not outside the reach of blockchains. Think of U.S. non-defense R&D, which is on the order of $70 billion a year. That is a lot, for sure, but it is not massive. Isolating NSF, which is around $10 billion a year, that is totally achievable through blockchains. Think of the crypto space having on the order of $1 to $3 trillion depending on when you look at it.

Imagine if blockchains were to devote some fraction of their supply to R&D on a yearly basis. Imagine taking one percent of Filecoin, Ethereum, or Bitcoin, and pouring it into R&D every year. You start hitting the numbers that are in range of funding science at a nation-state level. If crypto grows by another order of magnitude or two, crypto is going to be able to fund R&D and science at the scale of nation-states, which is pretty crazy to think about. So it'd be great to figure out the structures and figure out good funding pathways ahead of when we get there.

When you start breaking down funding from those agencies, you encounter all kinds of problems. Certain fields receive too little attention, or the programs themselves have perverse incentives or are way too competitive, yielding a condition where scientists are spending an enormous amount of their time just writing grants. There was an effort called Fast Grants around COVID, and the same effect was replicated in the Impetus Grants, where these programs structured a grant program that was very fast. They were able to give out grants on the order of 20k to 200k with a tiny fraction of the amount of time that scientists were putting in.

In one survey from scientists applying for those grants, they indicated how much time they normally spend applying for grants. Think of 25 to 50 percent of a scientist's time just going towards spelling out what they're doing and applying to various grants. This is kind of insane. Ideally, you would want scientists to spend the vast majority of their time thinking about their work, coming up with new ideas, and analyzing the work. There's also this effect where grant programs constrain what people end up exploring. Many scientists have much more ambitious research they want to pursue, but they end up stuck pursuing other work that is not nearly as impactful because they conform to the grant program's constraints.

Web3 public goods to the rescue! There are a lot of different groups. Of course, this is tiny still; the Web3 movement is very small compared to global science R&D funding, but if we can get the structures right, align the incentives well, and demonstrate that it works, then we can scale it by orders of magnitude along with crypto. We should explore many different kinds of funding for scientific processes: different grant programs, impact certificates, impact markets, and so on. The Funding the Commons community has been sampling a bunch of different mechanisms.

For example, groups like VitaDAO are creating a data structure giving out grants to groups in exchange for data, knowledge, and IP. Then they're bundling that IP into IPNFTs that carry legal weight, granting IP rights to biotechs, and funding those biotechs with the aim of returning investment through their success. I tend to call this a fundamental development fund, doing important work through labs that are not themselves companies, generating IP to then fund the companies. Groups like Molecule are creating marketplaces for that work to happen.

Certificates of impact are another fascinating structure representing retroactive funding. They enable participants, once they achieve some impact, to mint a certificate around that impact and sell it in the market to anybody that wants to claim that impact. That enables a speculative market to emerge, closing a loop across time to retroactively fund extremely important work. This is crucial because many times you only realize how valuable something is long after the work has been done.

Organizing people and Data DAOs (15:28)

Now some quick thoughts about organizing people. In the past, GitHub has been tremendously successful in helping organize scientific discovery. Entire textbooks and fields have developed through GitHub. Many groups have used the basic primitives of GitHub around issues, code collaboration, and version control to organize communities of practice and science. But what isn't complete there is you don't have a way of creating organizations that do research, dealing with capital, or paying contributors.

There are interesting experiments like LabDAO, creating lab teams where groups can form, raise funding, and distribute it. You're able to encode the different levels of contribution of participants to reward them fairly. There are more ambitious projects around credit assignment across participants in a larger network, propagating reward across different coupling teams.

There are groups experimenting with peer review protocols, observing the economics and dynamics of the peer review system to both incentivize the work and properly reward that it's happening. A protocol called Ants Review is doing this already, which you can use with MetaMask. Gitcoin Grants has pioneered a ton of the work that can be used here and is already supporting tooling for participants that want to organize in these ways.

One of the really key components here is linking content by hash. You can freeze a bundle of information, get a content addressed hash link, and reference things. This is the core primitive you want in literature. When you have a citation from one paper to another, or from a paper to its data or code, a CID is precisely the thing you want. Imagine being able to freeze the entire literature with version control and freeze all the important datasets and code required to run those experiments again. Many groups are exploring this, proposing different ways of doing peer review and science development through IPFS.

You can think of bundling that kind of activity and data generation with something called a Data DAO. Unlike the DAOs I mentioned before that are already getting started, Data DAOs are very new. Think of a group that is able to collect, curate, transform, and compute on data, and govern how that data is used over time, how it's monetized, and how it gets shared.

Some final notes on open access and reproducible science. IPFS has been used already a ton for many kinds of open science work. It's already living the dream of opening up access to a lot of science, supporting distributed Wikipedia copies, massive archives of papers, and datasets.

Open access, reproducible science, and call to action (20:40)

We're not quite there yet with full reproducibility. This is an area that needs more work, but a lot of people have already done the thinking. There are really good specs and ideas around using standard reproducibility with IPFS to freeze all the assets and build a fully reproducible pipeline. You can summon back specific experiments from the past, bring back totally frozen VMs or containers, rerun all the data pipelines, and verify the experiments are correct.

There's also a whole other angle around doing the data science itself in a DeSci-oriented way, where notebooks, data analysis, and artifacts are using Web3-powered applications. Things like Jupyter notebooks, IPython notebooks, and Wolfram notebooks already couple with CIDs. I think that's going to get supercharged in the future as the Filecoin network grows tremendously. The Filecoin network has a lot of storage coupled with compute—storage providers have tons of GPUs right next to the data. Those are going to get wired up in the next year with the ability to issue computational pipelines around that data. Think of generating a platform for scientists to do data science at massive scale, leveraging Web3 computing platforms for both the addressing and storage of information, as well as the computation, creating a full end-to-end pipeline of data science.

Finally, a quick call to action. Science is the engine of progress. By extending what we know, we're able to produce more technology and improve our lives. If we can improve the lives of scientists, make their work easier, accelerate their development, cut down their costs, and enable them to spend more time figuring out problems instead of writing grants, then we can all uniquely advance society much faster.

The DeSci movement needs you. Think of experimenting with new funding mechanisms, building open-access and open-science tooling, or playing around with public datasets. Think of joining a DeSci team or a DAO. Explore these communities, and I hope to see you in the movement. Thank you very much, and see you around.

(Applause)

இந்தப் பக்கம் பயனுள்ளதாக இருந்ததா?