Dario Amodei

CEO and Co-Founder, Anthropic

Dario Amodei Anthropic CEO on Claude AGI the Future of AI Humanity Lex Fridman Podcast 452 360p 2

🎥 Dec 09, 2024 📺 Алексей Панков ⏱ 60m

About Dario Amodei

Dario Amodei, CEO and co-founder of Anthropic, has been a prominent voice in discussions about AI's rapid advancement and its societal implications. In interviews and public appearances, he described the pace of AI development as an "exponential" that creates a feeling of accelerating away from normal time, comparing it to relativistic space travel. He stated that Anthropic's revenue grew roughly 10x per year, reaching an approximately $7 billion run rate, and that the company recorded 80x year-on-year growth in Q1 2026. Amodei said that AI models now write about 90% of code at Anthropic and some partner companies, but argued that this does not mean 90% of software engineers will be fired; instead, he suggested that under comparative advantage, engineers may become more leveraged and focus on the remaining 10%. Amodei has warned about potential economic disruption, stating that AI could produce a combination of very high GDP growth and high unemployment or inequality—a scenario he described as historically unprecedented. He expressed concern that AI may be uniquely suited to autocracy and surveillance, and advocated for export controls on chips to China, saying it would be "really bad for America" and democracy if China were to lead in AI capabilities. On safety, Amodei said Anthropic has a history of delaying model releases for safety reasons, costing "several hundred million dollars," and asked observers to judge the company by its overall record. He also discussed the need for government involvement in managing AI's impact, predicting that current ideological divisions over the technology will become bipartisan and universal as its effects become unavoidable.

Source: AI-verified profile updated from Dario Amodei's recent appearances. Browse all interviews →

Transcript (57 segments)

✨ AI-enhanced transcript with speaker attribution

Dario Amodei0:00

We should act now. They're getting better very, very fast. I testified in the Senate that we might have serious bio-risks within two to three years. That was about a year ago, and things have progressed at pace. So we have this thing where it's surprisingly hard to address these risks because they're not here today; they don't exist. They're like ghosts, but they're coming at us so fast because the models are improving so fast. How do you deal with something that's not here today, doesn't exist, but is coming at us very fast? The solution we came up with for that, in collaboration with organizations like METR and Paul Christiano, is that for that you need tests to tell you when the risk is getting close. You need an early warning system. So every time we have a new model, we test it for its capability to do these CBRN tasks, as well as testing it for how capable it is of doing tasks autonomously on its own. In the latest version of our RSP, which we released in the last month or two, the way we test autonomy risks is the model's ability to do aspects of AI research itself. When the AI models can do AI research, they become kind of truly autonomous. That threshold is important for a bunch of other ways. The RSP basically develops what we've called an 'if-then' structure, which is if the models pass a certain capability threshold, then we impose a certain set of safety and security requirements on them.

Interviewer6:10

What do you think the timeline for ASL-3 is, where several of the triggers are fired, and what do you think the timeline is for ASL-4?

Dario Amodei6:16

That is hotly debated within the company. We are working actively to prepare ASL-3 security measures as well as ASL-3 deployment measures. I'm not going to go into detail, but we've made a lot of progress on both. We're prepared to be ready quite soon. I would not be surprised at all if we hit ASL-3 next year. There was some concern we might even hit it this year, that could still happen. It's very hard to say, but I would be very surprised if it was like 2030. I think it's much sooner than that.

Interviewer6:54

There's protocols for detecting it, the if-then, and then there's protocols for how to respond to it. How difficult is the latter?

Dario Amodei7:08

For ASL-3, it's primarily about security and about filters on the model relating to a very narrow set of areas when we deploy the model. At ASL-3, the model isn't autonomous yet, so you don't have to worry about the model itself behaving in a bad way even when it's deployed internally. The ASL-3 measures are rigorous, but they're easier to reason about. Once we get to ASL-4, we start to have worries about the models being smart enough that they might sandbag tests or not tell the truth about tests. There were results about sleeper agents and a more recent paper about whether models can mislead attempts to sandbag their own abilities. I think with ASL-4, there's going to be an important component of using other things than just interacting with the models, for example interpretability or hidden chains of thought, where you have to look inside the model and verify via some other mechanism that the model indeed has some property. We're still working on ASL-4. One of the properties of the RSP is that we don't specify ASL-4 until we've hit ASL-3, and I think that's been a wise decision. We want to take as much time as we can to get these things right.

Interviewer8:56

So for ASL-3, the bad actor will be the humans, yes. For ASL-4, it's both, so deception and that's where mechanistic interpretability comes into play. Hopefully the techniques used for that are not made accessible to the model.

Dario Amodei9:16

Yes. Of course you can hook up the mechanistic interpretability to the model itself, but then you've kind of lost it as a reliable indicator. There are a bunch of exotic ways you can think that it might also not be reliable, like if the model gets smart enough that it can jump computers and read the code where you're looking at its internal state. We've thought about some of those; I think they're exotic enough, and there are ways to render them unlikely. But generally you want to preserve mechanistic interpretability as a kind of verification set that's separate from the training process of the model.

Interviewer9:53

I think as these models become better at conversation and become smarter, social engineering becomes a threat too. They could start being very convincing to the engineers inside companies.

Dario Amodei10:04

Yeah, we've seen lots of examples of demagoguery from humans, and there's a concern that models could do that as well.

Interviewer10:13

One of the ways that Claude has been getting more and more powerful is its ability to do agentic stuff, like computer use. There's also an analysis within the sandbox of Claude itself. Let's talk about computer use. It seems super exciting that you can give Claude a task and it takes a bunch of actions, figures it out, and has access to your computer through screenshots. Can you explain how that works and where that's headed?

Dario Amodei10:43

It's actually relatively simple. Claude has had for a long time, since Claude 3 back in March, the ability to analyze images and respond to them with text. The only new thing we added is that those images can be screenshots of a computer. In response, we trained the model to give a location on the screen where you can click, or buttons on the keyboard you can press in order to take action. It turns out that with not all that much additional training, the models can get quite good at that task. It's a good example of generalization. People sometimes say if you get to low earth orbit, you're halfway to anywhere because of how much it takes to escape the gravity well. If you have a strong pre-trained model, I feel like you're halfway to anywhere in the intelligence space. It didn't take all that much to get Claude to do this. You can just set that in a loop: give the model a screenshot, tell it what to click on, give it the next screenshot, tell it what to click on. That turns into a full kind of video-like interaction, and it's able to do all of these tasks. We showed demos where it's able to fill out spreadsheets, interact with websites, open all kinds of programs on different operating systems like Windows, Linux, Mac. While in theory there's nothing you could do there that you couldn't have done through just giving the model the API to drive the computer screen, this really lowers the barrier. There are a lot of folks who aren't in a position to interact with those APIs or it takes them a long time. Just the screen is a universal interface that's a lot easier to interact with. I expect over time this is going to lower a bunch of barriers. Honestly, the current model still leaves a lot to be desired. We were honest about that in the blog: it makes mistakes, it misclicks. We were careful to warn people that you can't just leave this thing to run on your computer for minutes and minutes. You've got to give this thing boundaries and guard rails. That's one of the reasons we released it first in an API form rather than handing it to the consumer to give it control of their computer. But I feel it's important to get these capabilities out there. As models get more powerful, we're going to have to grapple with how to use these capabilities safely and prevent them from being abused. Releasing the model while the capabilities are still limited is very helpful.

Interviewer14:31

How much do you have to specially go beyond what the pre-trained model is doing? Do more post-training, RLHF, supervised fine-tuning, or synthetic data just for the agent?

Dario Amodei14:44

Speaking at a high level, it's our intention to keep investing a lot in making the model better. We look at some of the benchmarks where previous models could do something 6% of the time, and now our model does it 14% or 22% of the time. We want to get up to human-level reliability of 80 to 90%, just like anywhere else. We're on the same curve that we were on with SWE-bench. I would guess a year from now the models can do this very reliably, but you've got to start somewhere.

Interviewer15:14

So you think it's possible to get to the human level 90% basically doing the same thing you're doing now, or does it have to be special for computer use?

Dario Amodei15:23

It depends what you mean by special. But I generally think the same kinds of techniques that we've been using to train the current model, I expect that doubling down on those techniques in the same way that we have for code, for models in general, for image input, for voice, I expect those same techniques will scale here as they have everywhere else.

Interviewer15:50

This is giving the power of action to Claude, so you could do a lot of really powerful things but you could do a lot of damage too.

Dario Amodei16:00

We've been very aware of that. My view is that computer use isn't a fundamentally new capability like the CBRN or autonomy capabilities are. It's more like it opens the aperture for the model to use and apply its existing abilities. Going back to our RSP, nothing that this model is doing inherently increases the risk from an RSP perspective. But as the models get more powerful, having this capability may make it scarier once it has the cognitive capability to do something at the ASL-3 and ASL-4 level. This may be the thing that unbinds it. Going forward, this modality of interaction is something we have tested for and will continue to test for in the RSP. I think it's probably better to have to learn and explore this capability before the model is super capable.

Interviewer17:05

There are a lot of interesting attacks like prompt injection because now you've widened the aperture. You can prompt inject through stuff on screen. If this becomes more and more useful, there's more and more benefit to inject stuff into the model. If it goes to a certain web page, it could be harmless stuff like advertisements or it could be harmful stuff.

Dario Amodei17:26

We thought a lot about things like spam, CAPTCHA, mass campaigns. I'll tell you a secret: if you've invented a new technology, the first misuse you'll see is scams, just petty scams. It's a thing as old as people scamming each other, and you always have to deal with it. Bots and spam in general, as it gets more intelligent, there are a lot of petty criminals in the world. Every new technology is a new way for them to do something stupid and malicious.

Interviewer18:18

Are there any ideas about sandboxing it? How difficult is the sandboxing task?

Dario Amodei18:21

We sandbox during training. For example, during training we didn't expose the model to the internet. I think that's probably a good idea during training because the model can be changing its policy and having an effect in the real world. In terms of actually deploying the model, it kind of depends on the application. Sometimes you want the model to do something in the real world, but you can always put guard rails on the outside. You can say the model is not going to move files from my web server to anywhere else. But when we get to ASL-4, none of these precautions are going to make sense. At ASL-4, the model could be smart enough to break out of any box. We need to think about mechanistic interpretability. If we're going to have a sandbox, it would need to be a mathematically provable one. That's a whole different world from what we're dealing with with models today.

Interviewer19:33

The science of building a box from which an ASL-4 AI system cannot escape, I think it's probably not the right approach. Instead of having something unaligned you're trying to prevent from escaping, it's better to design the model the right way or have a loop where you look inside the model and verify properties.

Dario Amodei20:01

I agree. Containing bad models is a much worse solution than having good models.

Interviewer20:08

Let me ask about regulation. What's the role of regulation in keeping AI safe? For example, the California AI regulation bill SB 1047 was ultimately vetoed by the governor. What are the pros and cons of this bill?

Dario Amodei20:23

We ended up making some suggestions to the bill, and some of those were adopted. We felt quite positively about the bill by the end of that. It did still have some downsides, and of course it got vetoed. At a high level, I think some of the key ideas behind the bill are similar to ideas behind our RSPs. I think it's very important that some jurisdiction, whether it's California, the federal government, or other countries and states, passes some regulation like this. I feel good about our RSP; it's a good forcing function for getting the company to take these risks seriously. But there are still some companies that don't have RSP-like mechanisms. If some companies adopt these mechanisms and others don't, it creates a negative externality, and I don't think the lack of uniformity is fair to those of us who have put a lot of effort into being thoughtful about these procedures. I don't think you can trust these companies to adhere to these voluntary plans in their own right. Anthropic does everything we can to adhere to our own RSP, but you hear lots of things about various companies saying they would do one thing and then not doing it. I think it's important to have a uniform standard that everyone follows. Some people are against regulation on principle. I understand where that comes from; in Europe, some of what they've done with GDPR is unnecessarily burdensome and has slowed innovation. But I think AI is different. If we go to the very serious risks of autonomy and misuse, I think those are unusual and warrant an unusually strong response. We need something everyone can get behind. One of the issues with SB 1047, especially the original version, was that it had a bunch of the structure of RSPs but also a bunch of stuff that was clunky and would have created a lot of hassle, and might have missed the target in addressing the risks. The worst enemy of those who want real accountability is badly designed regulation. We need to actually get it right.

Interviewer26:31

And come up with something surgical, like you said.

Dario Amodei26:34

Yes, exactly. We need to get away from this intense pro-safety versus anti-regulatory rhetoric. It's turned into flame wars on Twitter, and nothing good is going to come from that.

Interviewer29:37

There's a lot of curiosity about the different players in the game. One of the OGs is OpenAI. You had several years of experience at OpenAI. What's your story and history there?

Dario Amodei29:47

I was at OpenAI for roughly five years. For the last couple of years, I was Vice President of Research there. Myself and Ilya Sutskever were the ones who really set the research direction around 2016 or 2017. I first started to really believe in, or at least confirm my belief in, the scaling hypothesis when Ilya famously said to me, 'The thing you need to understand about these models is they just want to learn.' There are these Zen koans you hear, and you realize that explains a thousand things I've seen. After that, I had this visualization in my head: you optimize the models in the right way, point the models in the right way, they just want to learn and solve the problem. So get out of their way and don't impose your own ideas about how they should learn. This was the same thing as Rich Sutton put in the Bitter Lesson or the scaling hypothesis. The dynamic was that I got inspiration from Ilya and others like Alec Radford, who did the original GPT-1, and then ran really hard with it. My collaborators and I worked on GPT-2, GPT-3, RL from Human Feedback, which was an attempt to deal with early safety and reliability things like debate and amplification, and heavily on interpretability. The combination of safety plus scaling, probably around 2018, 2019, 2020, was when myself and my collaborators, many of whom became co-founders of Anthropic, had a vision and drove the direction.

Interviewer31:43

Why did you decide to leave?

Dario Amodei31:46

I think it ties to the race to the top. In my time at OpenAI, I came to appreciate the scaling hypothesis and the importance of safety along with it. The first one, OpenAI was getting on board with. The second one had always been part of OpenAI's messaging. But over many years, I had a particular vision of how we should handle these things, how they should be brought out into the world, and the kind of principles the organization should have. There are many discussions about what the company should do, but people say we left because we didn't like the deal with Microsoft, which is false. We didn't leave because of commercialization; we built GPT-3, which was the model that was commercialized, and I was involved in that. It's more about how you do it. Civilization is going down this path to very powerful AI. What's the way to do it that is cautious, straightforward, honest, and builds trust? How do we get from here to there and have a real vision for how to get it right? How can safety not just be something we say because it helps with recruiting? If you have a vision for how to do it, you should go off and do that vision. It's incredibly unproductive to argue with someone else's vision. You should take some people you trust and make your vision happen. If your vision is compelling, you can make a company that's a place people want to join and that engages in practices people think are reasonable while maintaining its position in the ecosystem. If you do that, people will copy it. The fact that you're doing it better than they are causes them to change their behavior in a much more compelling way than if you're their boss arguing with them. It's much more productive to go off and do a clean experiment. The point isn't to be virtuous; the point is to get the system into a better equilibrium. Anthropic is this clean experiment built on a foundation of what concretely AI safety should look like. We've made plenty of mistakes, but imperfect doesn't mean you give up. I hope we can build practices that the whole industry engages in.

Interviewer38:58

You said 'Talent density beats talent mass.' Can you explain that and talk about what it takes to build a great team of AI researchers and engineers?

Dario Amodei39:10

This is one of these statements that becomes more true every month. If I do a thought experiment: a team of 100 people who are super smart, motivated, and aligned with the mission, versus a team of 1,000 people where 200 are super smart and aligned, and 800 are random big tech employees, which do you prefer? The talent mass is greater in the group of a thousand, but if every time a super talented person looks around they see someone else super talented and dedicated, it sets the tone. Everyone is inspired and trusts everyone. If you have a thousand people and things have regressed, you need processes and rails in place because people don't fully trust each other. We're nearly a thousand people, and we've tried to make as large a fraction of them as possible super talented and skilled. It's one of the reasons we slowed down hiring. We grew from 300 to 800 in the first seven or eight months of the year, and then slowed down. There's an inflection point around a thousand. If your company consists of many different factions all optimizing for their own thing, it's hard to get anything done. But if everyone sees the broader purpose and there's trust and dedication to doing the right thing, that is a superpower. As Steve Jobs said, A players want to look around and see other A players. It is demotivating to see people not obsessively driving towards a singular mission, and super motivating to see the opposite.

Interviewer42:33

What does it take to be a great AI researcher or engineer, from everything you've seen working with so many amazing people?

Dario Amodei42:44

I think the number one quality, especially on the research side but really both, is open-mindedness. It sounds easy to be open-minded, but if I think about my own early history with the scaling hypothesis, I was seeing the same data others were seeing. I wasn't a better programmer or better at coming up with research ideas. In some ways, I was worse. The thing I had that was different was that I was willing to look at something with new eyes. People said we don't have the right algorithms yet, and I just thought, 'This neural net has 30 million parameters. What if we gave it 50 million and plotted some graphs?' That basic scientific mindset of seeing a variable you could change and trying it. It was simple and stupid, and anyone could have done it if they were told it was important. You didn't need to be brilliant. But a tiny number of people have driven the whole field forward by realizing this. This open-mindedness and willingness to see with new eyes often comes from being newer to the field. Experience can be a disadvantage for this. It's very hard to test for, but it's the most important thing.

Interviewer45:29

Can we rewind the clock? What advice would you give to young people interested in AI who want to make an impact on the world?

Dario Amodei45:40

My number one piece of advice is to just start playing with the models. Three years ago this wasn't obvious, and people would start by reading the latest reinforcement learning paper. Now with wider availability of models and APIs, people are doing this more. But experiential knowledge is key. These models are new artifacts that no one really understands. Also, again, do something new and think in a new direction. Mechanistic interpretability is still very new and fertile. There are things around long-horizon learning and tasks, evaluations for dynamic systems acting in the world, and multi-agent systems. Skate where the puck is going. You don't have to be brilliant to think of it. There's a barrier where people don't double down as much as they could or are afraid to do something that's not popular. Getting over that barrier is my number one piece of advice.

Interviewer47:47

Let's talk a bit about post-training. It seems the modern post-training recipe has a little bit of everything: supervised fine-tuning, RLHF, Constitutional AI with RL, and synthetic data. What's the secret sauce that makes Anthropic's Claude so incredible? How much of the magic is in pre-training versus post-training?

Dario Amodei48:23

We're not perfectly able to measure that ourselves. When you see some great characteristic, it's hard to tell whether it came from pre-training or post-training. We developed ways to try and distinguish between the two, but they're not perfect. When there is an advantage, and I think we've been pretty good at RL in general, it usually isn't some secret magic method that others don't have. Usually it's that we got better at the infrastructure so we could run it for longer, or we were able to get higher quality data or filter our data better. It's usually a boring matter of practice and tradecraft. I think of it more like designing airplanes or cars: there's a cultural tradecraft of how we think about the design process that is more important than any particular gizmo.

Interviewer49:51

First on RLHF: zooming out, intuition, almost philosophy, why do you think RLHF works so well?

Dario Amodei50:01

Going back to the scaling hypothesis, if you train for X and throw enough compute at it, you get X. RLHF is good at doing what humans want the model to do, or more precisely, what humans who look at the model for a brief period of time prefer. That's not perfect from both a safety and capabilities perspective. Humans are often not able to perfectly identify what the model wants, and what they want in the moment may not be what they want in the long term. But the models are good at producing what the humans in some shallow sense want. It turns out you don't have to throw that much compute at it because a strong pre-trained model is halfway to anywhere. Once you have the pre-trained model, you have all the representations you need to get the model where you want it to go.

Interviewer51:07

Do you think RLHF makes the model smarter or just appear smarter to the humans?

Dario Amodei51:14

I don't think it makes the model smarter. I don't think it just makes the model appear smarter either. It's like RLHF bridges the gap between the human and the model. You could have something really smart that can't communicate at all. We all know people like this: really smart but you can't understand what they're saying. So I think RLHF bridges that gap. It's not the only kind of RL we do, and it's not the only kind that will happen in the future. RL has the potential to make models smarter, reason better, and develop new skills. The kind of RLHF we do today mostly doesn't do that yet, although we're quickly starting to be able to. It appears to increase the metric of helpfulness. I like the word 'unhobbling' from Leopold's essay; the models are hobbled and then you do various trainings to unhobble them.

Interviewer52:33

In terms of cost, is pre-training the most expensive thing, or is post-training creeping up to that?

Dario Amodei52:40

At the present moment, pre-training is still the majority of the cost. I could certainly anticipate a future where post-training is the majority of the cost.

Interviewer52:50

Would the costly part be the humans or the AI?

Dario Amodei52:54

I don't think you can scale up humans enough to get high quality. Any method that relies on humans and uses a large amount of compute will have to rely on some scaled supervision method like debate or iterated amplification.

Interviewer53:13

Can you describe Constitutional AI, as detailed in the December 2022 paper?

Dario Amodei53:25

The basic idea is this. With RLHF, you have a model and it spits out two possible responses, and you ask a human which one they like better. That's hard to scale because you need human interaction, and it's very implicit. Two ideas: one, could the AI system itself decide which response is better? Show the AI these two responses and ask which is better. Two, what criterion should the AI use? Then there's the idea of a single document, a constitution, that says these are the principles the model should use to respond. The AI system reads those principles as well as the environment and the response, and says how good the AI model did. It's a form of self-play, training the model against itself. The AI gives a response, feeds that back into what's called the preference model, which in turn feeds the model to make it better. The set of principles are human-interpretable, so both the human and the AI system can read them. In practice, we use both a model constitution and RLHF, so it's one tool in a toolkit that reduces the need for RLHF and increases the value we get from each data point.

Interviewer55:38

The natural question is: who gets to define the constitution, the set of principles?

Dario Amodei55:53

I'll give a practical and a more abstract answer. Practically, models get used by all kinds of different customers. You can have specialized rules or principles; we fine-tune versions of models, and we've talked about having special principles people can build in. A customer service agent behaves very differently from a lawyer. But at the base, there are specific principles models have to obey that people would agree with: we don't want models to present CBRN risks. We can go a little further and agree with some basic principles of democracy and the rule of law. Beyond that, it gets uncertain. Our goal is for the models to be more neutral, to not espouse a particular point of view, and to be like wise agents or advisers that help you think things through.

Interviewer57:13

OpenAI released a model spec where it clearly defines some of the goals of the model with specific examples of how it should behave. Do you find that interesting? And do you think Anthropic might release a model spec as well?

Dario Amodei57:38

I think that's a pretty useful direction. It has a lot in common with Constitutional AI. It's another example of a race to the top: we have something we think is a better and more responsible way of doing things, and then others discover it has advantages and start to do it. We no longer have the competitive advantage, but it's good because everyone has adopted a positive practice. Our response is to look for a new competitive advantage to keep driving this race upwards. Every implementation is different, so there were things in the model spec that were not in Constitutional AI, and we can learn from them.

Interviewer58:39

Let's talk about the incredible essay 'Machines of Loving Grace'. It's a long one, and it's refreshing to read concrete ideas about what a positive future looks like. You took a bold stance because you might be wrong on the dates or specific applications.

Dario Amodei58:58

I'm fully expecting to be wrong about all the details. I might be spectacularly wrong about the whole thing and people will laugh at me for years. That's how the future works.

Interviewer59:13

You provided a bunch of concrete positive impacts of AI, and how a superintelligent AI might accelerate breakthroughs in biology and chemistry, leading to curing most cancers, preventing all infectious diseases, doubling the human lifespan, and so on. Can you give a high-level vision of this essay and what key takeaways people should have?

Dario Amodei59:42

I have spent a lot of time on how to address the risks of AI. The market will produce all the positive things, but the risks we might not mitigate. So we can have more impact by trying to mitigate risks. But I noticed a flaw in that way of thinking. It's not a change in how seriously I take risks, but a change in how I talk about them. If you only talk about risks, your brain only thinks about risks. It's important to understand what happens if things go well. The whole reason we're trying to prevent these risks is not because we're afraid of technology. It's because we want to enable a fantastic future. The essay is an attempt to paint that picture in a concrete, non-sci-fi way.