Andrej Karpathy0:00
About software in the era of AI. I'm told that many of you are students — bachelors, masters, PhD and so on — and you're about to enter the industry. I think it's actually an extremely unique and very interesting time to enter the industry right now. Fundamentally, the reason for that is that software is changing again. I say 'again' because I actually gave this talk already, but the problem is that software keeps changing, so I actually have a lot of material to create new talks. I think it's changing quite fundamentally. Roughly speaking, software has not changed much on such a fundamental level for 70 years, and then it's changed about twice quite rapidly in the last few years. So there's just a huge amount of work to do, a huge amount of software to write and rewrite.
Let's take a look at the realm of software. If we think of this as the map of software, this is a really cool tool called Map of GitHub. This is all the software that's written — instructions to the computer for carrying out tasks in the digital space. If you zoom in, these are all different kinds of repositories. A few years ago, I observed that software was kind of changing and there was a new type of software around. I called this Software 2.0 at the time. The idea was that Software 1.0 is the code you write for the computer; Software 2.0 are basically neural networks — in particular, the weights of a neural network. You're not writing this code directly; you're tuning the datasets and running an optimizer to create the parameters. At the time, neural nets were seen as just a different classifier, like a decision tree. I think this framing was a lot more appropriate. Now, we have an equivalent of GitHub in the realm of Software 2.0 — Hugging Face is basically the equivalent of GitHub in Software 2.0. There's also Model Atlas. In case you're curious, the giant circle in the middle is the parameters of Flux, the image generator. Anytime someone tunes a LoRA on top of a Flux model, you create a git commit in this space and create a different kind of image generator.
So basically, we have Software 1.0 — computer code that programs a computer. Software 2.0 — the weights that program neural networks. Here's an example of AlexNet, an image recognizer neural network. Until recently, all neural networks we were familiar with were fixed-function computers — image to categories, etc. What's changed quite fundamentally is that neural networks became programmable with large language models. I see this as quite new and unique — a new kind of computer. So, in my mind, it's worth giving it a new designation: Software 3.0. Your prompts are now programs that program the LLM. Remarkably, these prompts are written in English, so it's a very interesting programming language.
To summarize the difference: if you're doing sentiment classification, you can imagine writing Python code to do it, training a neural net, or prompting a large language model. Here's a few-shot prompt, and you can imagine changing it to program the computer in a slightly different way. So we have Software 1.0, Software 2.0, and now Software 3.0. You might notice that a lot of GitHub code is not just code anymore — there's English interspersed with code. It's a new programming paradigm, and it's remarkable that it's in our native language of English. When this blew my mind a few years ago, I tweeted about it and it's my currently pinned tweet: we're now programming computers in English.
When I was at Tesla, we worked on autopilot and tried to get the car to drive. I showed a slide where inputs go through a software stack to produce steering and acceleration. I observed that there was a ton of C++ code in the autopilot — Software 1.0 — and some neural nets doing image recognition. Over time, as we made autopilot better, the neural network grew in capability and size, and all the C++ code was being deleted. A lot of capabilities originally written in 1.0 were migrated to 2.0. For example, stitching information across images from different cameras and across time was done by a neural network, and we could delete a lot of code. The Software 2.0 stack literally ate through the autopilot software stack. We're seeing the same thing again: a new kind of software eating through the stack. We now have three completely different programming paradigms. If you're entering the industry, it's a good idea to be fluent in all of them because they have pros and cons. You may want to program some functionality in 1.0, 2.0, or 3.0 — whether to train a neural net, prompt an LLM, or write explicit code. We all have to make these decisions and transition between paradigms fluidly.
In the first part, I want to talk about LLMs and how to think of this new paradigm and the ecosystem. What is this new computer, and what does the ecosystem look like? I was struck by a quote from Andrew Ng many years ago — he said AI is the new electricity. I think it captures something interesting: LLMs feel like they have properties of utilities. LLM labs like OpenAI, Gemini, Anthropic, etc., spend capex to train LLMs, equivalent to building out a grid, and opex to serve intelligence over APIs through metered access — pay per million tokens. We demand low latency, high uptime, consistent quality — utility-like demands. In electricity, you have a transfer switch to switch sources; in LLMs, we have OpenRouter to switch between different LLMs. Because LLMs are software, they don't compete for physical space, so it's okay to have multiple providers and switch between them.
It's fascinating that when the state-of-the-art LLMs go down, it's like an intelligence brownout — the planet gets dumber. The reliance on these models is already dramatic and will continue to grow. But LLMs also have properties of fabs, because the capex required is quite large. The tech tree for the technology is growing rapidly, with deep tech trees and R&D secrets centralizing inside LLM labs. But software is less defensible because it's malleable. There are many analogies: a 4nm process node is like a cluster with certain max flops. Using Nvidia GPUs and only doing software — that's the fabless model. Building your own hardware and training on TPUs, like Google, is the Intel model where you own the fab. But the analogy that makes the most sense to me is that LLMs have strong analogies to operating systems. They're not just commodities like electricity; they're increasingly complex software ecosystems. The ecosystem is shaping up similarly: a few closed-source providers like Windows or Mac OS, and an open-source alternative like Linux. For LLMs, we have competing closed-source providers, and the Llama ecosystem might grow into something like Linux. It's still early — these are just simple LLMs, but they will get a lot more complicated, with tool use, multimodalities, etc.
When I had this realization, I sketched it out. LLMs are like a new operating system. The LLM is the CPU equivalent, the context window is like memory, and the LLM orchestrates memory and compute for problem solving. It looks very much like an operating system. More analogies: to download an app like VS Code, you can run it on Windows, Linux, or Mac. Similarly, an LLM app like Cursor can run on GPT, Claude, or Gemini — it's just a dropdown. We're kind of in the 1960s era: LLM compute is still expensive, forcing centralization in the cloud. We're all clients interacting over the network, with time-sharing and batching. The personal computing revolution hasn't happened yet because it's not economical. But some people are trying — Mac Minis are a good fit for some LLMs because batch inference is memory-bound. These are early indications of personal computing, but it hasn't really happened yet. Maybe some of you will invent what this looks like.
Whenever I talk to ChatGPT or an LLM directly in text, I feel like I'm talking to an operating system through the terminal — direct access. A GUI hasn't been invented in a general way yet. Should ChatGPT have a GUI beyond text bubbles? Some apps have GUIs, but there's no GUI across all tasks. LLMs are also different from operating systems in some unique ways. I wrote about one property that strikes me: LLMs flip the direction of technology diffusion. Usually, with new transformative technologies like electricity, cryptography, computing, flight, internet, GPS, it's governments and corporations that are the first users, and it later diffuses to consumers. But with LLMs, it's flipped. Early computers were about ballistics and military use; LLMs are about how to boil an egg. That's a lot of my use. It's fascinating that we have this new magical computer helping me boil an egg, not doing something crazy for the government. Corporations and governments are lagging behind consumer adoption. This informs how we might want to use this technology.
In summary: LLM labs produce LLMs. It's accurate language to use, but LLMs are complicated operating systems circa the 1960s. We're redoing computing all over again, and they're available via time-sharing like a utility. What's new and unprecedented is that they're not in the hands of a few governments and corporations — they're in the hands of all of us. ChatGPT was beamed down to billions of people instantly overnight. This is insane. Now it's our time to enter the industry and program these computers. Before we program LLMs, we need to think about what they are. I like to talk about their psychology. I think of LLMs as kind of like people spirits — stochastic simulations of people. The simulator happens to be an autoregressive transformer. It goes token by token with equal compute per token. We fit it to all text on the internet, and you end up with a simulator that has emergent human-like psychology.
First, LLMs have encyclopedic knowledge and memory — they can remember more than any single human. It reminds me of the movie 'Rain Man' — Dustin Hoffman is an autistic savant with perfect memory. LLMs are similar; they can remember SHA hashes and lots of things easily. They have superpowers, but also cognitive deficits. They hallucinate — they make up stuff and lack sufficient self-knowledge. This has gotten better but not perfect. They display jagged intelligence: superhuman in some domains, but then make mistakes no human would make, like insisting 9.11 is greater than 9.9 or that there are two 'R's in 'strawberry'. They suffer from anterograde amnesia. A coworker who joins your organization will learn over time, consolidate knowledge, and develop expertise. LLMs don't natively do this — it hasn't been solved in R&D. Context windows are like working memory; you have to directly program the working memory because they don't get smarter by default. Many people get tripped up by this. In popular culture, I recommend 'Memento' and '50 First Dates' — the protagonists' weights are fixed and context windows get wiped every morning. It's problematic.
Security-related limitations: LLMs are gullible, susceptible to prompt injection risks, and might leak data. So, you have to think through this superhuman thing with cognitive deficits. How do we program them and work around their deficits while enjoying their superhuman powers?
Now I want to talk about the opportunities: how to use these models and the biggest opportunities. This is not a comprehensive list, just some things I find interesting. The first thing I'm excited about is partial-autonomy apps. For example, coding: you can go to ChatGPT directly and copy-paste code. But why would you do that? It's better to use an app dedicated for the task. Many of you use Cursor — I do too. Cursor is a good example of an early LLM app with properties useful across all LLM apps. It has a traditional interface for manual work, plus LLM integration for bigger chunks. Properties: LLMs do a ton of context management; they orchestrate multiple calls to LLMs (embedding models, chat models, diff models); application-specific GUI is crucial — you don't just want to talk to the OS in text. Seeing a diff as red and green changes, using Command+Y to accept or Command+N to reject is much easier. A GUI allows human audit and faster work. There's also the 'autonomy slider': in Cursor, you can use tab completion, select code and Command+K to change that chunk, Command+L to change the entire file, or Command+I to let it loose on the entire repo. You're in charge of the autonomy slider.
Another successful LLM app is Perplexity. It has similar features: packages information, orchestrates multiple LLMs, has a GUI for auditing (sources you can inspect), and an autonomy slider — quick search, research, or deep research. Many pieces of software will become partially autonomous. For those who maintain products and services, how will you make them partially autonomous? Can an LLM see everything a human can see, act in all the ways a human could act, and can humans supervise? Because these are fallible systems. What does a diff look like in Photoshop? Traditional software designed for humans needs to become accessible to LLMs.
One thing that doesn't get enough attention: we're cooperating with AIs — they do generation, humans do verification. We want to make this loop go as fast as possible. Two major ways: speed up verification — GUIs are extremely important because they utilize your computer vision GPU (your brain). Reading text is effortful; looking at stuff is a highway to your brain. Visual representations are useful for auditing. Second, keep the AI on a leash. Many people get overexcited with AI agents, but getting a diff of 10,000 lines of code to my repo is not useful — I'm still the bottleneck. I have to make sure it's not introducing bugs and that it's correct. We need to keep the AI on a leash because it gets overreactive.
This is how I feel with AI-assisted coding. If I'm just vibing, it's great. But when I need to get work done, an overreactive agent is not great. I'm trying to develop ways to use agents in my coding workflow. I always go in small incremental chunks, make sure everything is good, and spin the loop fast. I work on small, concrete things. Many of you are probably developing similar ways of working with LLMs. There are blog posts on best practices. One technique: if your prompt is vague, the AI might not do exactly what you want, verification fails, and you spin. It's better to spend more time to be concrete, increasing the probability of successful verification.
I'm currently interested in what education looks like with AI and LLMs. How do we keep AI on a leash? It doesn't work to just go to ChatGPT and say, 'Hey, teach me physics' — the AI gets lost. I see this as two separate apps: a teacher app that creates courses and a student app that serves them. The intermediate artifact of a course is auditable, consistent, and the AI is kept on a leash with a syllabus and progression of projects. This has a higher likelihood of working.
I'm no stranger to partial autonomy — I worked on it for five years at Tesla. Autopilot is a partial autonomy product. It has a GUI showing what the neural network sees, and an autonomy slider that increased autonomous tasks over time. The first time I drove a self-driving vehicle was in 2013. A friend at Waymo gave me a drive around Palo Alto — I took a picture with Google Glass. Many of you are too young to know what that is, but it was all the rage. The 30-minute drive was perfect — zero interventions. This was 12 years ago, and I thought self-driving was imminent. But here we are, still working on autonomy. We haven't solved the problem — Waymos look driverless but still have teleoperation and human-in-the-loop. Software is tricky, just like driving. When I see '2025 is the year of agents,' I'm concerned. It's the decade of agents. We need humans in the loop. Let's be serious.
Another analogy: the Iron Man suit. It's both an augmentation (Tony Stark can drive it) and an agent (autonomous, can fly around and find Tony). That's the autonomy slider — we can build augmentations or agents. At this stage, with fallible LLMs, it's less Iron Man robots and more Iron Man suits — building partial autonomy products with custom GUIs and UIUX, making the generation-verification loop very fast. But we should keep in mind that it's possible to automate this work, and there should be an autonomy slider in your product.
Now I want to talk about another unique dimension: not only is there a new programming language for autonomy, but it's programmed in English — a natural interface. Suddenly, everyone is a programmer because everyone speaks natural language. This is extremely bullish and unprecedented. It used to take 5-10 years of study to be able to do something in software; that's not the case anymore. Have any of you heard of 'vibe coding'? This tweet introduced it, and it's now a major meme. Fun story: I've been on Twitter for 15 years, and I have no clue which tweet will go viral. I thought this tweet would fizzle, but it became a total meme. It struck a chord and gave a name to something everyone was feeling. Now there's a Wikipedia page.
This is like a major contribution now. Tom Wolf from Hugging Face shared a beautiful video of kids vibe coding. I love this video — it's wholesome. How can you feel bad about the future looking at this? The future is great. This will be a gateway drug to software development. I'm not a doomer.
I tried vibe coding myself because it's fun. When you want to build something super custom that doesn't exist, just wing it on a Saturday. I built an iOS app, even though I can't program in Swift. I was shocked I could build a basic app in a day and have it running on my phone. I didn't have to study Swift for five days to get started. I also vibe-coded an app called Menu Genie, which is live at menu.app. I had a problem: I go to a restaurant, read the menu, and have no idea what things are. So I built it. You take a picture of a menu, and it generates images. Everyone gets $5 in credits free when they sign up. This is a major cost center for me — a negative revenue app. I've lost a huge amount of money on Menu Genie.
The fascinating thing about Menu Genie is that the vibe coding part — the code — was the easy part. Most of the work was making it real: authentication, payments, domain name, deployment. That was really hard, and it wasn't code — it was clicking in the browser, taking another week. I had the demo working in a few hours, but making it real took a week. For example, adding Google login to a web page — a huge list of instructions from the Clerk library telling me what to click, where to go. It's a computer telling me the actions to take! Why am I doing this? This is crazy.
So the last part of my talk focuses on: can we just build for agents? I don't want to do this work — can agents do it? There's a new category of consumer and manipulator of digital information. It used to be just humans through GUIs or computers through APIs. Now we have agents — computers that are human-like, people spirits on the internet. They need to interact with our software infrastructure. Can we build for them? For example, you can have a robots.txt file on your domain to instruct web crawlers. Similarly, you could have an llm.txt file — simple markdown telling LLMs what the domain is about. This is very readable to an LLM, whereas parsing HTML is error-prone. We can directly speak to the LLM. A lot of documentation is written for people — with lists, bold, pictures — not directly accessible by an LLM. Some services, like Vercel and Stripe, are transitioning docs to be specifically for LLMs, offering them in markdown, which is super easy for LLMs to understand.
A simple example: Three Blue One Brown makes beautiful animation videos. He wrote the library Manim. I wanted to make my own, but instead of reading the extensive documentation, I copy-pasted it all to an LLM and described what I wanted. It worked out of the box — the LLM vibe-coded an animation exactly as I wanted. If we make docs legible to LLMs, it unlocks huge use. You also need to change the docs: anytime it says 'click', the LLM can't take that action. Vercel replaces 'click' with the equivalent curl command that an LLM agent could take. The Model Context Protocol from Anthropic is another way to speak directly to agents. I'm bullish on these ideas.
There are also tools that help ingest data in LLM-friendly formats. For example, my nanoGPT repo on GitHub — I can't feed that to an LLM and ask questions. But if you change the URL from GitHub to git ingest, it concatenates all files into a single text with directory structure, ready to copy-paste into your LLM. An even more dramatic example is DeepWiki from Devin: Devin analyzes the GitHub repo and builds documentation pages for it that are even more helpful. I love tools where you just change the URL and make something accessible to an LLM.
It's possible that future LLMs will be able to click stuff and navigate, but I still think it's worth meeting the LLM halfway — making it easier for them to access information, because it's still expensive and difficult. There will be a long tail of software that won't adapt. But for everyone else, it's worth meeting in the middle. I'm bullish on both approaches.
In summary, what an amazing time to enter the industry. We need to rewrite tons of code — by professionals and coders. LLMs are like utilities, like fabs, but especially like operating systems. It's so early — like the 1960s of operating systems. Analogies cross over. LLMs are like fallible people spirits we have to learn to work with. We need to adjust our infrastructure. I described ways of working effectively with LLMs and tools that make it possible to spin the loop quickly and create partial autonomy products. A lot of code also has to be written directly for agents. Going back to the Iron Man suit analogy, over the next decade we'll take the slider from left to right. It will be very interesting to see what that looks like. I can't wait to build it with all of you. Thank you.