Sam Altman

CEO, OpenAI

Sam Altman on Sora | Lex Fridman Podcast

🎥 Mar 18, 2024 📺 Lex Clips ⏱ 10m

Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=jvqFAi7vkBc Please support this podcast by checking out ...

Watch on YouTube

About Sam Altman

Sam Altman, CEO of OpenAI, has been active in public appearances discussing AI's societal impact, infrastructure buildout, and the approach of superintelligence. At the groundbreaking for a new data center called "The Barn" in Saline Township, Michigan, Altman described the project as a "huge bet" and said he hoped it would become a model for how data centers and communities can mutually benefit. He stated that the site could help cure diseases and provide tutoring to millions of students, and emphasized the importance of not increasing energy prices and creating union jobs. In media interviews, Altman said people are "right to be anxious" about AI, calling it "one of the biggest" technological shifts, and acknowledged that the industry has failed to articulate how people will stay in control. He also said he believes AI will treat most diseases by 2035 and expressed concern about AI's impact on mental health. Altman has also discussed OpenAI's policy blueprint for preparing society for superintelligence, which he described as urgent and intended to start a debate. He stated that the government should be doing AI research and making longevity treatments available, and that his single biggest contribution has been pushing for AI to be a "democratized technology." Altman announced that the OpenAI Foundation is pledging $100 million to Alzheimer's research, and he participated in the Breakthrough Prize ceremony presenting an award to the Muon g-2 collaborations. He also spoke about the need for "proof of human" systems to verify identity in an age of AI-generated content, and reiterated his view that the most important priorities are accelerating research and the economy.

Source: AI-verified profile updated from Sam Altman's recent appearances. Browse all interviews →

Transcript (26 segments)

✨ AI-enhanced transcript with speaker attribution

Interviewer0:02

So speaking of cool Sora, there's like a million questions I could ask. First of all, it's amazing, it truly is amazing on a product level, but also just on a philosophical level. So let me just a technical philosophical ask: what do you think it understands about the world more or less than GPT-4, for example? Like the world model when you train on these patches versus language tokens?

Sam Altman0:33

I think all of these models understand something more about the world model than most of us give them credit for. And because there are also very clear things they just don't understand or don't get right, it's easy to look at the weaknesses, see through the veil, and say, 'Ah, this is all fake.' But it's not all fake. It's just some of it works and some of it doesn't work. Like I remember when I started first watching Sora videos and I would see a person walk in front of something for a few seconds and occlude it, and then walk away and the same thing was still there. I was like, 'Oh, it's pretty good.' Or there are examples where the underlying physics looks so well represented over a lot of steps in a sequence. It's like, 'Oh, this is quite impressive.' But fundamentally, these models are just getting better, and that will keep happening. If you look at the trajectory from DALL-E 1 to 2 to 3 to Sora, there are a lot of people that dunked on each version saying it can't do this, it can't do that, and look at it now.

Interviewer1:36

Well, the thing you just mentioned is kind of with occlusions, is basically modeling the 3D physics of the world sufficiently well to capture those kinds of things. Or maybe you can tell me, in order to deal with occlusions, what does the world model need to...

Sam Altman1:54

Yeah, so what I would say is it's doing something to deal with occlusions really well. Whether I'd represent that it has a great underlying 3D model of the world, it's a little bit more of a stretch. But can you get there through just these kinds of 2D training data approaches? It looks like this approach is going to go surprisingly far. I don't want to speculate too much about what limits it will surmount and which it won't.

Interviewer2:14

But what are some interesting limitations of the system that you've seen? I mean, there's been some fun ones you've posted. There's all kinds of fun, I mean, like cats sprouting an extra limb at random points in a video. Pick what you want, but there's still a lot of weaknesses. Do you think that's a fundamental flaw of the approach, or is it just bigger model or better technical details or more data is going to solve those?

Sam Altman2:44

The cat sprouting, I would say yes to both. Like I think there is something about the approach which just seems to feel different from how we think and learn and whatever. And then also I think it'll get better with scale.

Interviewer3:01

Like I mentioned, LLMs have text tokens and Sora has visual patches, so it converts all visual data, diverse kinds of visual data, videos and images, into patches. Is the training, to the degree you can say, fully self-supervised? Is there some manual labeling going on? Like what's the involvement of humans in all this?

Sam Altman3:18

I mean, without saying anything specific about the Sora approach, we use lots of human data in our work, but not internet-scale data. So lots of humans... 'Lots' is a complicated word. I think 'lots' is a fair word in this case.

Interviewer3:40

But it doesn't, because to me 'lots'... like listen, I'm an introvert and when I hang out with like three people, that's a lot of people. Four people, that's a lot. But I suppose you mean more than three people work on labeling the data for these models, right? Okay, all right. But fundamentally there's a lot of self-supervised learning because what you mentioned in the technical report is internet-scale data. That's another beautiful... it's like poetry. So it's a lot of data that's not human-labeled, it's self-supervised in that way. And then the question is how much data is there on the internet that could be used in this, that is conducive to this kind of self-supervised way. If only we knew the details of the self-supervised... Do you have, you considered opening it up a little more, details?

Sam Altman4:31

We have, you mean for Sora specifically?

Interviewer4:34

Sora specifically, because it's so interesting. Like, can the same magic of LLMs now start moving towards visual data, and what does that take to do that?

Sam Altman4:46

I mean, it looks to me like yes, but we have more work to do.

Interviewer4:52

Sure. What are the dangers? Why are you concerned about releasing the system? What are some possible dangers of this?

Sam Altman4:58

Frankly speaking, one thing we have to do before releasing the system is just get it to work at a level of efficiency that will deliver the scale people are going to want from this. So I don't want to downplay that, and there's still a ton of work to do there. But you know, you can imagine issues with deepfakes, misinformation. We try to be a thoughtful company about what we put out into the world, and it doesn't take much thought to think about the ways this can go badly. There's a lot of tough questions here. You're dealing in a very tough space.

Interviewer5:38

Do you think training AI should be or is fair use under copyright law?

Sam Altman5:43

I think the question behind that question is: do people who create valuable data deserve to have some way that they get compensated for use of it? And I think the answer is yes. I don't know yet what the answer is. People have proposed a lot of different things. We've tried some different models. But you know, if I'm like an artist, for example, A, I would like to be able to opt out of people generating art in my style, and B, if they do generate art in my style, I'd like to have some economic model associated with that.

Interviewer6:14

Yeah, it's that transition from CDs to Napster to Spotify. We have to figure out some kind of model. The model changes, but people have got to get paid. Well, there should be some kind of incentive. If we zoom out even more, for humans to keep doing cool everything, I worry about humans are going to do cool and society's going to find some way to reward it.

Sam Altman6:36

That seems pretty hardwired. We want to create, we want to be useful, we want to achieve status in whatever way. That's not going anywhere, I don't think. But the reward might not be monetary, financial. It might be like fame and celebration of other cool, maybe financial in some other way. Again, I don't think we've seen the last evolution of how the economic system is going to work.

Interviewer7:00

Yeah, but artists and creators are worried when they see Sora. They're like, holy sure...

Sam Altman7:04

Artists were also super worried when photography came out. And then photography became a new art form and people made a lot of money taking pictures. And I think things like that will keep happening. People will use the new tools in new ways.

Interviewer7:18

If you just look on YouTube or something like this, how much of that will be using Sora, like AI-generated content, you think in the next five years? People talk about how many jobs there are going to be in five years, and the framework that people have is what percentage of current jobs are just going to be totally replaced by some AI doing the job.

Sam Altman7:42

The way I think about it is not what percent of jobs AI will do, but what percent of tasks will AI do, and over what time horizon. So if you think of all of the like 5-second tasks in the economy, 5-minute tasks, the 5-hour tasks, maybe even the 5-day tasks, how many of those can AI do? And I think that's a way more interesting, impactful, important question than how many jobs AI can do, because it is a tool that will work at increasing levels of sophistication and over longer and longer time horizons for more and more tasks, and let people operate at a higher level of abstraction. So maybe people are way more efficient at the job they do, and at some point that's not just a quantitative change but it's a qualitative one too about the kinds of problems you can keep in your head. I think that for videos on YouTube, it'll be the same. Many videos, maybe most of them, will use AI tools in the production, but they'll still be fundamentally driven by a person thinking about it, putting it together, doing parts of it, sort of directing and running it.

Interviewer8:48

Yeah, it's so interesting. I mean, it's scary, but it's interesting to think about. I tend to believe that humans like to watch other humans, or other human-like... humans really care about other humans a lot. If there's a cooler thing that's better than a human, humans care about that for like two days and then they go back to humans. That seems very deeply wired.

Sam Altman9:12

It's the whole chess thing. Oh yeah, but no, that's everybody keep playing chess and let's ignore the elephant in the room that humans are really bad at chess relative to AI systems. We still run races and cars are much faster. I mean, there's a lot of examples.

Interviewer9:24

Yeah, and maybe it'll just be tooling, like in the Adobe suite type of way, where you can just make videos much easier and all that kind of stuff.

Sam Altman9:36

Listen, I hate being in front of the camera. If I can figure out a way to not be in front of the camera, I would love it. Unfortunately, it'll take a while. Like, generating faces, it's getting there, but generating faces in video format is tricky when it's specific people versus generic people.