Criticism of Claude from Reddit | Dario Amodei and Lex Fridman

🎥 Nov 11, 2024 📺 Lex Clips ⏱ 7m 👁 2973 views

Lex Fridman Podcast full episode: • Dario Amodei: Anthropic CEO on Claude, AGI... Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/cv821... See below for guest bio, links, and to give feedback, submit questions, contact Lex, etc. GUEST BIO: Dario Amodei is the CEO of Anthropic, the company that created Claude. Amanda Askell is an AI researcher working on Claude's character and personality. Chris Olah is an AI researcher working on mechanistic interpretability. CONTACT LEX: Feedback - give feedback to Lex: https://lexfridman.com/survey AMA - submit questi...

Watch on YouTube

About Dario Amodei

Dario Amodei, CEO and co-founder of Anthropic, has been a prominent voice in discussions about AI's rapid advancement and its societal implications. In interviews and public appearances, he described the pace of AI development as an "exponential" that creates a feeling of accelerating away from normal time, comparing it to relativistic space travel. He stated that Anthropic's revenue grew roughly 10x per year, reaching an approximately $7 billion run rate, and that the company recorded 80x year-on-year growth in Q1 2026. Amodei said that AI models now write about 90% of code at Anthropic and some partner companies, but argued that this does not mean 90% of software engineers will be fired; instead, he suggested that under comparative advantage, engineers may become more leveraged and focus on the remaining 10%. Amodei has warned about potential economic disruption, stating that AI could produce a combination of very high GDP growth and high unemployment or inequality—a scenario he described as historically unprecedented. He expressed concern that AI may be uniquely suited to autocracy and surveillance, and advocated for export controls on chips to China, saying it would be "really bad for America" and democracy if China were to lead in AI capabilities. On safety, Amodei said Anthropic has a history of delaying model releases for safety reasons, costing "several hundred million dollars," and asked observers to judge the company by its overall record. He also discussed the need for government involvement in managing AI's impact, predicting that current ideological divisions over the technology will become bipartisan and universal as its effects become unavoidable.

Source: AI-verified profile updated from Dario Amodei's recent appearances. Browse all interviews →

Transcript (4 segments)

✨ AI-enhanced transcript with speaker attribution

Interviewer0:03

This is probably something I'll talk to Amanda much more about, but another Reddit question: When will Claude stop trying to be my puritanical grandmother, imposing its moral worldview on me as a paying customer? And also, what is the psychology behind making Claude overly apologetic? So this reports about the experience, a different angle on the frustration has to do with the character.

Dario Amodei0:27

Yeah, so a couple points on this. First one is, like, things that people say on Reddit and Twitter or X or whatever it is, there's actually a huge distribution shift between like the stuff that people complain loudly about on social media and what actually kind of statistically users care about and that drives people to use the models. People are frustrated with things like the model not writing out all the code or the model just not being as good at code as it could be, even though it's the best model in the world on code. I think the majority of things are about that. But certainly a vocal minority raise these concerns — are frustrated by the model refusing things that it shouldn't refuse, or apologiating too much, or just having these kind of annoying verbal ticks. The second caveat, and I just want to say this super clearly because I think some people don't know it, others kind of know it but forget it: it is very difficult to control across the board how the models behave. You cannot just reach in there and say, oh, I want the model to apologize less. You can do that — you can include training data that says the model should apologize less — but then in some other situation they end up being super rude or overconfident in a way that's misleading people. So there are all these tradeoffs. For example, another thing is, if there was a period during which models, and I think others as well, were too verbose — they would repeat themselves, they would say too much — you can cut down on the verbosity by penalizing the models for just talking for too long. What happens when you do that, if you do it in a crude way, is when the models are coding, sometimes they'll say, "rest of the code goes here," because they've learned that that's a way to economize. And that leads the model to be so-called lazy in coding, where they're just like, "ah, you can finish the rest of it." It's not because we want to save on compute or because the models are lazy and during winter break, or any of the other conspiracy theories that have come up. It's actually just very hard to control the behavior of the model, to steer the behavior of the model in all circumstances at once. There's this whack-a-mole aspect where you push on one thing and these other things start to move as well that you may not even notice or measure. And so one of the reasons that I care so much about grand alignment of these AI systems in the future is actually, these systems are actually quite unpredictable. They're actually quite hard to steer and control. This version we're seeing today — you make one thing better, it makes another thing worse — I think that's a present-day analog of future control problems in AI systems that we can start to study today. I think that difficulty in steering the behavior and in making sure that if we push an AI system in one direction it doesn't push it in another direction in some other ways that we didn't want — I think that's kind of an early sign of things to come. And if we can do a good job of solving this problem — you ask the model to make and distribute smallpox and it says no, but it's willing to help you in your graduate-level virology class — like, how do we get both of those things at once? It's hard. It's very easy to go to one side or the other, and it's a multi-dimensional problem. So I think these questions of shaping the model's personality are very hard. I think we haven't done perfectly on them. I think we've actually done the best of all the AI companies, but still so far from perfect. And I think if we can get this right, if we can control the false positives and false negatives in this very controlled present-day environment, we'll be much better at doing it for the future, when our worry is, will the models be super autonomous? Will they be able to make very dangerous things? Will they be able to autonomously build whole companies, and are those companies aligned? So I think of this present task as both vexing but also good practice for the future.

Interviewer5:01

What's the current best way of gathering sort of user feedback, like not anecdotal data but just large-scale data about pain points or the opposite of pain points, positive things, so on? Is it internal testing? Is it a specific group testing? AB testing? What works?

Dario Amodei5:19

So typically we'll have internal model bashings where all of Anthropic — Anthropic is almost a thousand people — people just try and break the model, they try and interact with it in various ways. We have a suite of evals for, oh, is the model refusing in ways that it shouldn't? I think we even had a "certainly" eval because our model at one point had this problem where it had this annoying tick where it would respond to a wide range of questions by saying, "Certainly I can help you with that, certainly I would be happy to do that, certainly this is correct." And so we had a "certainly" eval, which is like, how often does the model say "certainly"? But look, this is just a whack-a-mole. Like, what if it switches from "certainly" to "definitely"? So, you know, every time we add a new eval — and we're always evaluating for all the old things, so we have hundreds of these evaluations — but we find that there's no substitute for human interacting with it. And so it's very much like the ordinary product development process. We have like hundreds of people within Anthropic bash the model, then we do external AB tests, sometimes we run tests with contractors — we pay contractors to interact with the model. So you put all of these things together and it's still not perfect. You still see behaviors that you don't quite want to see. You still see the model refusing things that it just doesn't make sense to refuse. But I think trying to solve this challenge — trying to stop the model from doing genuinely bad things that everyone agrees it shouldn't do, right? Everyone agrees the model shouldn't talk about, I don't know, child abuse material — everyone agrees the model shouldn't do that. But at the same time it doesn't refuse in these dumb and stupid ways. I think drawing that line as finely as possible, approaching perfectly, is still a challenge, and we're getting better at it every day. But there's a lot to be solved, and again, I would point to that as an indicator of a challenge ahead in terms of steering much more powerful models for.