Back
Fei-fei Li
Co-Director, Stanford Institute for Human-Centered AI

Fei-fei Li shares her view on spatial intelligence

🎥 Jul 01, 2025 📺 TKEthics Pulse ⏱ 7m 👁 38 views
AI pioneer Fei-Fei Li explores why spatial intelligence is the next frontier in building truly embodied and interactive artificial intelligence.
Watch on YouTube

About Fei-fei Li

Fei-Fei Li, co-founder and CEO of World Labs and co-director of the Stanford Institute for Human-Centered AI, appeared at Bloomberg Tech 2026 in San Francisco on June 2, 2026. She discussed her company’s focus on spatial intelligence and building a "large world model," which she described as a technology rooted in the evolution of animal intelligence—starting with seeing and moving in the physical world. Li stated that World Labs is working on "one of the most critical technology in the speech of physical intelligence" and expressed hope that the company will ship a model over spatial intelligence this year that will inspire "incredibly exciting product opportunities." Li also commented on industry trends and safety discourse. She said she has been "more measured" on AI safety rhetoric, noting that "there's so much hype" and that the field needs "thoughtfulness to invest in the right effort." Regarding the term AGI, Li said she does not engage with it, adding that she is focused on building technology that can "make a difference in people's lives." She argued that robotics will be "one of the most important revolutions in human industrialization" and that $6 billion in investment is "too small" compared to past investments in self-driving cars and language models. Li also called for changing K-16 education, stating that AI's ability to outperform average humans on standardized tests means "we need to change the education."

Source: AI-verified profile updated from Fei-fei Li's recent appearances. Browse all interviews →

Transcript (6 segments)
✨ AI-enhanced transcript with speaker attribution
F
Fei-Fei Li0:00
The world of AI and machine learning was so different at that time. There was very little data. Algorithms, at least in computer vision, did not work. There was no industry. As far as the public was concerned, the word AI didn't exist. But there was still a group of us. I think we just had an AI dream. We really really wanted to make machines think and work. And with that dream, my own personal dream was to make machines see because seeing is such a cornerstone of intelligence. Visual intelligence is not just perceiving; it's really understanding the world and doing things in the world. So I was obsessed with the problem of making machines see. One problem always haunted me: the problem of generalization.
If you're working in machine learning, you have to respect that generalization is the core mathematical foundation, or goal, of machine learning. And in order to generalize, these algorithms need data. Yet no one had data at that time in computer vision. We have to bet that there needs to be a paradigm shift in machine learning, and that paradigm shift has to be led by data-driven methods. So of course you all know the past—it's really hard to summarize the past five or six years. For me, we're living in such a civilizational moment of this technology's progress right now. As a computer vision scientist, we're seeing this incredible growth from image net to image captioning to image generation using some of the diffusion techniques. While this is happening in a very exciting way, we also have another extremely exciting thread which is language. And of course, the elephant in the room is there's a lot of data on the internet for language. But where is the data for spatial intelligence? It's all in our head, of course, but it's not as easily accessible as language. So these are the reasons it's so hard. But frankly, it excites me because if it's easy, somebody else has solved it. My entire career is going after problems that are just so hard, bordering on delusional. And I think this is the delusional problem. Thank you for supporting that.
There is the LLM. A lot of what we see in LLM is really writing the scaling law all the way to happy ending, and you can almost brute-force self-supervision all the way. Constructive world model might be a little more nuanced. The world is more structured. There might be signals that we need to use to guide it. You can call it in the shape of a prior, you can call it supervision in your data, whatever it is. I think that these are some of the open questions that we have to solve. But you're right. Also, if you think about humans, first of all, we don't have all the answers even to human perception, right? How does 3D work in human vision is not a solved problem. We know mechanically the two eyes have to triangulate information, but even after that, where is the mathematical model? And we're not that great—humans are not that great as 3D animals. So there is a lot that needs to be answered.
We are definitely at World Lab. I'm just counting on one thing. I'm counting on having the smartest people in the pixel world to solve this. I'm not going to be able to talk too much about the details of World Labs per se, but in terms of spatial intelligence, that's what excites me. Just like language, the use case is so huge, from creation—which you can think about designers, architects, industrial designers, as well as artists and 3D artists and game developers—from creation all the way to robotics. The utility of spatial intelligence models or world models is really, really big. And then there are many related industries, from marketing to entertainment to even the metaverse. I'm actually really excited by the metaverse. I know so many people are still like, it's still not working. I know it's still not working. That's why I'm excited, because I think the convergence of hardware and software will be coming. So that's another great use case down the road.
Around 2018, AI was not only taking over the industry; AI became a human problem. Humanity will always advance our technology, but we cannot lose our humanity. And I really care about creating a beacon of light in the progress of AI, trying to imagine how AI can be human-centered, how we can create AI to help humanity. So I went back to Stanford and created the Human-Centered AI Institute and ran that as a startup for five years. I was very proud of that. So in a way, I think I just love being an entrepreneur. I love the feeling of ground zero—standing on ground zero. Forget about what you have done in the past, forget about what others think of you. Just hunker down and build. That is my comfort zone and I just love that.
First of all, interdisciplinary AI is a really exciting area in academia, especially for scientific discovery. There are just so many disciplines that can cross AI. I think that's a big area that one could go to. On the theoretical side, I find it fascinating that AI capability has 100% outrun theory. We don't know how; we don't have explainability. We don't know how to figure out causality. There's just so much in the models we don't understand that one could push forward. And the list can go on. In computer vision, there are still representational problems we haven't solved, and also small data is another really interesting domain. So yeah, these are the possibilities.