Mark Zuckerberg2:18
No. No. And to be clear, we don't think that we're going to be the ones curing the diseases. Our goal was always to build tools that could accelerate the whole scientific field. That way, the scientific field collectively could cure all the diseases. But still, I thought that by the end of the century was a stretch. Now, I think it's too conservative. And so we kept being like, 'Okay, well, we had these series of funny, awkward, educational conversations where we're like, okay, but like why? Like why do you think it's impossible?' And being the person in the room is just like, 'Well, I don't know why. You tell me.' Finally, we got people to like they're like, 'Fine, if you really must know.' And we're like, 'No, we do. It seems important.' They were like, well we work in silos and when you publish information doesn't get shared, it gets locked up for long periods of time and we don't have tooling. They gave the example of we build a great tool by one postdoc in a lab and it lives on their computer and when they graduate the tool is gone. It was very hard to build shared tools to move science faster, build a shared knowledge base to quickly move science faster. That's where we began in thinking about okay, if those are the problems, what can we contribute?
Yeah, I mean so the original biohub model was basically focus on long-term tool development by bringing together engineers and scientists across multiple universities to focus on long-term tool development and it basically worked. We started off with CZI doing a number of different things and I think over time we just felt like okay, the science piece is really working and we just kept on investing more and more in it until now it is basically the primary and main thing that we're doing. We've expanded the original San Francisco Biohub to a handful now. There's New York, there's Chicago. The real focus and the unifying theme at this point is the virtual biology initiative around taking the unique data sets that are able to be generated in order to effectively model starting with the smallest pieces of proteins, but then eventually cells and whole biological systems. That's how we've evolved. This idea that some of this is an AI problem and you want to build a frontier AI lab, but you need to couple that with a frontier biology effort that can do the work of being able to understand and get the data that you need to actually be able to build these models. Because unlike language models, there's just a lot of data out there on the internet. That's not really the case with biology. There are obviously a bunch of different data sets that exist that academia and scientists have generated over the decades, but a lot of the stuff that I think we want to put into this, it doesn't exist. You want to be able to visualize things that people haven't been able to see before, which is why we're doing the imaging work. You want to be able to record things that are going on inside the body, which is why we're doing the kind of cellular engineering work. You want to be able to measure things like inflammation in ways that haven't been possible, which is why the Chicago Biohub is focused on building those kind of devices. That will fundamentally create new types of data sets that will allow new types of models. Going back to what you said, if the scientific field primarily needs tool development that now is going to empower scientists across the field to be able to do their work faster, that's what we think we can provide through this kind of long-term focus on tool development. But I think there's a fun through line from where we started to our work with Alex. Our very first request for application here was around single cell sequencing and we wanted to look at the RNA that is transcribed in individual cells. That was possible but it was still pretty early on in understanding how different cells were expressing their DNA. At the beginning we were just funding methods, getting people to describe how to do it so that others could share that methodology. Then that became us funding the human cell atlas, which is now one of the largest databases of single cell transcriptomes. It was getting hard for scientists to annotate the data, so we built cellbygene, which was a very simple annotation tool that scientists could use to make use of that data. A community came around cellbygene and started contributing more data that we had nothing to do with. Now cellbygene is a corpus of knowledge that a lot of transcriptomic based models are based off of and is used regularly by the scientific community. But there were always critiques that this is just stamp collecting, just gathering bits of data, and we're not going to be able to pull scientific knowledge and wisdom and insights out of it. We didn't have an answer for a while, and then imagine our delight when large language models became a huge topic of conversation that could make sense of large amounts of data. For me, what if we could actually understand how biology worked? Move it from a discovery based science to an engineering based science where we could systematically understand how living cells worked and understand why things go wrong. When we saw that moment, we're like, this is it. Something really big could happen here.