Mark Zuckerberg2:18
No. No. And to be clear, we don't think that we're going to be the ones curing the diseases. Our goal was always to build tools that could accelerate the whole scientific field. That way, the scientific field collectively could cure all the diseases. But still, I thought that by the end of the century was a stretch. Now, I think it's like too conservative. And so we kept being like, 'Okay, well, we had these series of funny, awkward, educational conversations where we're like, okay, but like why? Like why do you think it's impossible?' And like, you know, just being the person in the room is just like, 'Well, I don't know why. You tell me.' Finally, we got people to like they're like, 'Fine, if you really must know.' And we're like, 'No, we do. It seems important.' They were like, well we work in silos and when you publish information doesn't get shared it gets locked up for long periods of time and we don't have tooling. You know, they gave the example of like we build a great tool by one postdoc in a lab and it lives on their computer and when they graduate the tool is gone. And what we heard was very hard to build shared tools to move science faster, build the shared knowledge base to quickly move science faster. And that's sort of where we began in thinking about okay like if those are the problems like what can we contribute.
Yeah, I mean so the original biohub model was basically focus on long-term tool development by bringing together engineers and scientists across multiple universities to focus on long-term tool development and it basically worked. And you know we started off with CZI doing a number of different things and I think over time we just felt like okay the science piece is really working and we just kept on investing more and more and more in it until now it is basically the primary and main thing that we're doing. And we've expanded the original San Francisco Biohub to a handful now at this point. There's New York, there's Chicago. The real focus and the unifying theme at this point is the virtual biology initiative around taking the unique data sets that are able to be generated in order to model effectively starting with the smallest pieces of proteins but then eventually cells and whole biological systems. But that's kind of how we've evolved is, you know, this idea that we talk about around that some of this is an AI problem and you want to build a frontier AI lab, but you need to couple that with a frontier biology effort that can do the work of basically being able to understand and get the data that you need to actually be able to build these models. Because unlike language models, well, there's just like a lot of data out there on the internet. That's not really the case with biology. I mean, there are obviously a bunch of different data sets that exist that academia and scientists have generated over the decades, but a lot of the stuff that I think we want to put into this, it doesn't exist, right? It's like you want to be able to visualize things that people haven't been able to see before, which is why we're doing the imaging work. You want to be able to record things that are going on inside the body, which is why we're doing the kind of cellular engineering work, right? You want to be able to measure things like inflammation in ways that haven't been possible which is why the Chicago Biohub is focused on building those kind of devices and being able to do that and that will fundamentally create new types of data sets that will allow new types of models and I think is just a very exciting thing that going back to what you're saying if the scientific field primarily needs kind of tool development that now is going to empower scientists across the field to be able to do their work faster, that's what we think we can provide through this kind of long-term focus on tool development.
But I think there's a fun throughline on where we started and you know bringing us to our work with that Alex is driving now is that our very first request for application RFA here was around single cell sequencing and we wanted to look at sort of like the RNA that is transcribed in individual cells and that was possible but it was still pretty early on in understanding how different cells were expressing their DNA to the point where at the beginning we were just funding methods like getting people to describe how to do it so that others could share that methodology and then that became us funding the human cell atlas which is now one of the largest databases of single cell transcriptomes. It was getting hard for scientists to annotate the data. So we built Cell by Gene which was like a very simple annotation tool that scientists could use to make use of that data. Then a community came around Cell by Gene built around Cell by Gene and started contributing more and more data that we had nothing to do with sort of creating or funding or making happen in the world. And now Cell by Gene is a corpus of knowledge that a lot of the transcriptomic based models are based off of and is used regularly by the scientific community. But still there are always critiques like this is just stamp collecting like you're just gathering bits of knowledge well sorry bits of data and we're not going to be able to pull scientific knowledge and wisdom and insights out of. And we're like well we didn't have an answer for a while and then imagine our delight when large language models became a huge topic of conversation that could make sense of large amounts of data. And I just for me is what if we could actually understand how biology worked? Move it from a discovery based science to an engineering based science where we could systematically understand how living beings, living cells worked and be able to understand why things go wrong. And so when we saw that moment, we're like, this is it. Something really big could happen here.