Jensen Huang0:05
Nvidia GTC Taipei. So great to see all of you. Very good to be home. It is incredible how large our ecosystem in Taiwan has become. Most of the time when people think about ecosystem, they think about our software stack. They think about the developer ecosystem above the computing systems that Nvidia builds. But Nvidia's ecosystem spans all the way upstream to all of our supply chain here in Taiwan where it all begins and downstream all the way to data centers and eventually to end users. Today we're going to talk about almost all of the ecosystem. There's so many people to thank. I love my ecosystem here. I mean, there are so many companies here and some of my favorite ecosystem partners. So many. Taiwan's rich ecosystem, the richest ecosystem, the world's best supply chain ecosystem. Unbelievable.
Two years ago when I was here, I started to talk to you about how AI has moved from generative AI and the other waves of AIs that are coming. The next wave of AI was agentic AI. And today we can say that agentic AI has arrived. That useful AI has arrived. Now what does that mean from the industry's perspective? From the industry's perspective, that means that tokens are now in extraordinary demand. Because if you could do this, you're going to want to produce more of it. And because tokens are now profitable units, tokens are now profitable units of revenues because it is now profitable. The AI companies want to build a lot more tokens, generate a lot more tokens, build more AI factories, which is the reason why compute demand here in Taiwan has skyrocketed. It is precisely the reason why all of you are so busy and your businesses are doing so well. In fact, that looks like some of your stock price. The compute pattern has changed, everything has changed. So the first idea is that useful AI has arrived. AI is now a profit generator. AI is now a GDP generator and behind it is a whole new kind of computing pattern. Not just a large language model, but an agent. Today, almost everything we're going to talk about is going to be based on this. So, let me take a quick moment and show you what I'm talking about inside this is an agent. It's an agent application. In the old days, this would be application. This would be code and this would be operating system application code running inside an application inside an operating system. Today it is agent which consists of a large language model or many sitting inside a harness and that harness helps it orchestrates it to do productive work. This is the input. When that input comes, it has to understand, observe, reason, act, use tools. That tool could be a spreadsheet, web browser, a data processing engine, database engine. For example, this is orchestrated, this harness orchestrates the routing of information every single time it touches either processing the context, understanding what is happening, reasoning about what to do, coming up with a plan that you can act and that it acts on. That orchestration path is orchestrated by some software. And so this is fundamentally an agent. It deals with short-term memory called working memory, long-term memory just like we do. We have long-term memory. And so the memory management system is incredibly important. This entire system is called an agent. The large language model is used to do the thinking and the harness connects everything together just like an operating system. Okay. And so this is the new computing model and this is what an agent can do. It could do incredible things. This is the big breakthrough. The simultaneous convergence of large language models that are now able to do a really good job thinking, reasoning, planning, using tools and the fact that we have now these harnesses that manages memory, the orchestration, uses tools. We can now do amazing things. Token. DSX. GPU. Token. Black. GPU CPU. Vera. We built our next generation. Vera Rubin. Vera Rubin is not one chip. Vera Rubin is not a GPU only. It starts with the GPU. But Vera Rubin is incredible. This entire thing is Vera Rubin from end to end. It has GPUs Vera Rubin MVLink72. It is orchestrated by Vera CPUs that I'm going to tell you more about. The storage systems revolutionary Vera along with CX9 our software stack called DOA, the security processor that's inside so that everything is encrypted at rest, in motion as well as in use. Everything across this is secure because the AI model is so precious. This is the reason why this entire system obeys confidential computing. Each one of these systems would be a complete revolution in itself. Vera Rubin is the most ambitious endeavor in the history of our company. The whole company worked on Vera Rubin across all 40,000 engineers. Not to mention all of you. All of you participated in the creation of this entire system. Vera Rubin is really a miracle and it's not just one chip. It is so many. Well, it's even beyond that. A long time ago, Nvidia used to be a GPU company, but over the years, we've evolved to become a systems company. You're looking here now for the most complex system, most complex and ground-up system ever designed. But ultimately our customers, our partners don't want to buy computers. They want to build AI factories. Which is the reason why Nvidia has really started to transform ourselves yet again. You could see so much of our technology is now at the entire infrastructure scale. Our partners are at infrastructure scale. Power generators, cooling systems, the grid providers, so many industrial companies are now part of our ecosystem because ultimately we're trying to build an entire stack just like GPUs, just like when we were building Grace Blackwell, MVLink 72, just like now we are building a full stack system so that our customers could build amazing AI infrastructure. Now, doing this well and becoming incredibly good at helping customers build AI factories and deploying AI factories is incredibly important. And the reason for that is this. Compute is revenue. Now, compute is profit. The absence of revenues and profit is loss. And so it's really important to realize that this is when this is an example of an AI infrastructure coming online. It could take it could be coming online quickly. It could take a while. Its throughput could be high. It could be low. Its resilience and reliability could be good or bad. And its lifetime of usefulness could be long or short. Because this represents 50, 60 going to a hundred billion dollars. This curve matters greatly which is the reason why Nvidia is such a great partner working with us because of our fully integrated capability. We didn't just come up with a PowerPoint slide. We created the entire infrastructure. We connected everything together. We built out billions and billions of it ourselves to make sure that everything works well. As a result of that, our time to first token, our time to first inference, our time to training turned on is much faster. Second, because our throughput per watt, our tokens per watt is utterly world class. And the reason for that is because we integrate everything. We design everything from the ground up. We simulate the entire system and we use extreme code design. Just like I showed you just now with the Vera Rubin rack, everything was designed in order to deliver on this incredible throughput. If your data center, if your factory has one gigawatt, it will not have more. One gigawatt means 1 gigawatt. That's all the power generation you could do. If you have one gigawatt of power, then throughput per watt is revenues because every token is profitable. Every token is revenues. This is the future. Compute is revenues. Performance per watt is your revenues. Choosing the wrong architecture just because the chips are cheaper doesn't translate. Doesn't make sense. You need to make sure that your revenues per watt. The more you buy, the more you make. Here I am standing in front of you. Vera Rubin is in full production. The supply chain we created for Vera Rubin is twice as large as Grace Blackwell. It's incredible. And what used to take two hours to assemble one Grace Blackwell rack now only takes five minutes. So not only is the capacity higher, the throughput is a lot faster and we need it all to support the demand. This ecosystem is extraordinary. Millions of square feet has been put online to support Grace Blackwell and preparing now ramping up now Vera Rubin. I want to thank all of you. Vera Rubin is now in full production. Thank you ladies and gentlemen. Vera Rubin. Vera Rubin was not just built for AI. Vera Rubin was not built just to run AI. Vera Rubin was built to run agents. This is an agentic system. Imagine the complexity. Which is the reason why agents is the last computer science breakthrough. It has taken this many years for agents to realize its potential and become useful. It stands to reason that the computer that runs it is the most advanced in the world. This is Vera Rubin. Let's take a look. Can we bring out Vera Rubin, please? This is Vera Rubin. Vera Rubin MVLink 72. This is the Gro LPX. At the next GTC, I'm going to talk to you about a lot more of this today. We have so much to talk to you about. This is Vera CPU rack. 256 CPUs all liquid cooled. Let me tell you about Vera in just a moment. This is the Vera Bluefield storage processing system and also security system. And of course, this is our Melanox networking, the world's first CPO. This is Vera Rubin. Incredible technology all coming together. Now when we built Hopper, we built Hopper as you know for pre-training. Pre-training was the most important application, the most important workload we were working on at the time. Then when we worked on Grace Blackwell, everybody said, 'Jensen, you know, Nvidia is really good at pre-training. Inference is so easy.' Do you remember that? People used to say, 'Inference is so easy. We could do that too.' But as you know, inference equals money. And the models are so complicated. And to do it at incredibly high response time, fast interactivity, and high throughput at the same time is incredibly hard, which is the reason why we created NVLink 72. Today, Nvidia's token cost is the lowest in the world. Not by 10% by X factors, orders of magnitude. All because we did extreme code design. All because we understood the computing model, the computing pattern of inference. And we were able to create MVLink 72. Now with Vera Rubin, it is beyond inference. It is now inference in an agentic system. This is Vera Rubin. No cables, no hoses, no fans. What used to take the last time when I showed this to you, we had cables everywhere. Vera CPUs. CPUs built for the age of AI. All of the CPUs until now were created for people. We were the users. We were the users. We were the renters. The way we use CPUs, we live in a world counted by seconds. The way we rent CPUs in the cloud, each one of them more you can more CPU cores you have, the more you can rent. The economics of the old use case of the old CPU and the economics of the old CPU fundamentally different than agents. Agents are impatient. They don't live in a world that is in seconds. They live in a world that's in nanoseconds. When it uses a tool, it wants the response time to be as fast as possible. When it accesses database, it has to come back as soon as possible. Every moment that the agent is waiting keeps it from going to the next step, the next step, the next step. It is vital that we make the CPUs as low latency as possible, as interactive as possible. So we created Vera CPU for the age of AI. Now inside our system, it's used for three different ways. The first way of course is Vera Rubin for thinking. And inside the Vera Rubin rack there are already two CPUs. As you know we are building and selling millions of Vera Rubins. We have sold millions of Grace Blackwalls. Nvidia already is one of the largest CPU makers in the world. Vera in the Vera Rubin rack are two CPUs. One for orchestrating and managing the GPUs, managing the KV cache, dealing with all of the software that runs in the rack. We also have the Grace Bluefield that is used for security and isolation. The Vera compute is used for the harness, the orchestration of the AI models, tool use, accessing the database. And the data servers are right here, Vera Bluefield, the fastest storage, fastest storage servers, the fastest storage system the world has ever made. And the reason why this is so vital is because agents are accessing memory, accessing memory so incredibly fast. These systems, the storage server and the CPUs are now the critical path of the most expensive part of the data center. This is the most expensive for a good reason. The economics of the AI factory is tokens and the tokens are created here. And so of course you want to manufacture and generate as many tokens as possible. This is where you put all of your economics and this has to not be in the way. And so Vera CPU has great pressure on the CPU architecture which is the reason why we built a brand new architecture from the ground up. A CPU the world has never seen before. We call it Vera. This is CPU for agents. All the CPUs of the past we built for humans. This CPU is built for agents. Well, there are four things to keep in mind. The four takeaways. The first takeaway is that the instructions per clock of Vera has to be incredibly good because we need the latency to be short. We need the processing time. Singlethreaded performance, not throughput. Singlethreaded performance has to be world class. Absolutely the best singlethreaded performance. Which is the reason why the IPC the instructions per clock of Vera is so high. It's the highest in the world. 10 instructions fetched, decoded, and executed per clock. Number one. Number two, the bandwidth necessary to move data in and out for the CPU has to be utterly world class. The second thing is bandwidth per core. The third is just bandwidth period. I said earlier agentic systems is fundamentally disaggregated and distributed. Disaggregated and distributed. When computing is disaggregated and distributed, networking becomes the problem. Therefore, we have to move the data around as fast as possible between the CPU cores and between the CPU and the storage, the CPU and the GPU. The bandwidth around the system and inside the CPU core has to be utterly worldclass because the CPU cores are talking to each other with extremely high bandwidth. They're not rented core per core per core. They're all working together. The cross-sectional bandwidth of Vera is off the charts. It's the first one to be PCI Express Gen 6. It is also the first one to have LPDDR DDR5 with 1.2 to 2 terabytes per second. Three times two to three times the bandwidth of the highest performance CPUs. CPU for agents. And this market will surely be larger than the last. And the reason for that is because there'll be a lot more agents than there are people and then the agents are very impatient. So Nvidia Vera CPU thank you.
This is the most important slide really. This is the takeaway. The takeaway here is that this is the application pattern. This is the computing pattern of the next decade. Agents, harnesses, orchestrating large language models. Every company will run it. Every company will be an agent company. Every company will have agents running inside. Every company will see that agents will need its own operating system. Every company's asking us how do we run agents safely? How do we build agents for our own workloads? And so we have the NVIDIA agent toolkit for enterprise AI. You've seen me build this in plain sight. Almost everything that Nvidia does, as you know, at every GTC, if you go back and look at my GTC 5 years ago or 10 years ago, you will see today this you've seen me talking about for several years now because we've been building for this moment. There are four things that companies need in order to build agents as a service or build agents to operate. The first thing you need is you need models. Of course, large language models, the smarter the better, the cheaper the better, the faster the better. The second is you need a harness to orchestrate the whole thing. The third, these models want to use tools and these tools come with its skills and I showed you CUDA X libraries. Those are going to be amazing tools for the agents in the future. And then lastly, you need a runtime. You need the operating system that holds it all together. This is the NVIDIA toolkit for agents. It includes models that you can modify. NVIDIA's worldclass open models. And I want to show you more. You can run agents from anybody. You can run cloud code, incredible agent. Codeex incredible agent. You could run it inside this harness called open shell which will be highly secure for your enterprise. The shell protects the agent, keeps it grounded in security policies. Privacy is protected. Its rights and privileges are given. Its identity is protected. And so this open shell is being adopted all over the world. NVIDIA Open Shell is open source. You're going to see so many companies adopt it. Red Hat, Canonical, Microsoft, it's going to be adopted everywhere. This is the runtime and this runtime is fully optimized for the NVIDIA AI platform which is everywhere. So you can run open shell in any cloud, on prem and even on device. So you have now tools and libraries that they can use. You have models that you can modify or use as is or you have agents. This be open claw Hermes another incredible harness. These agentic harnesses can now run on prem or anywhere. One of my favorite use cases of agents is chip designers. It is the single most important thing that Nvidia does. And so of course we have to partner with Cadence to build a chip design super agent. It is orchestrated by codecs. It has RTL and architecture diagrams or schematics or specifications as input and whatever you need to fix. And together we created some super agents that are optimized for the NVIDIA runtime with Neotron. Nvidia is dedicated to build open models for the world so that all of you, all of us could create our own agents. Today we're announcing the Neotron 3 Ultra. Yep. Our next open model and it is smart. The Neotron models not only give you the model, we give you all the data that we use to train the model. And because we have a coalition of incredible partners, you can see all of our partners down here. We work together, contribute data to each other because of all of our great partnerships. All of this from the model, the training script, and the data made completely available to you. This is open models at its best. The best open model system policies in the world. Simple goal is so that you can take all of it, add to it, make it even better, make it yours. Neimotron 3 Ultra is five times faster. It is also 30% cheaper, completely open. We're completely dedicated to this. This is now Neotron 3. We're currently working on Neotron 4. So this entire toolkit from models, harnesses, tools and skills and runtimes is the reason why every enterprise company in the world has the ability now to create their own agents just like Cadence did with their super agents.
Microsoft and Nvidia are going to reinvent the PC. This is going to be the new PC. Now, tomorrow night, I think it's tomorrow night our time, but I'm going to be with Satya where we're going to talk a lot more about the work that we're doing together. Microsoft and Nvidia over the last three years. It took this long to completely reinvent how the PC is going to work so that we could be ready for this moment. As I mentioned earlier, that compute pattern called the agent. It's going to run in AI clouds. It's going to run inside enterprises. It is also going to run on your PC. What's going to happen to that PC when it has an autonomous agent? An agent that's helping you, that understands you. You could talk to it. It could look at you. You could ask it to read files, go help you do some research. It could do a lot more that I'll show you. But the new operating system is, of course, the old operating system plus large language models. Large language models in a lot of ways is the modern version of DirectX. It has of course input and output, understands prompts, it understands computer vision, it can generate video, it can generate sounds. It is the modern extension, the intelligence extension of the PC, of a computer. On top of that, the application as I mentioned before is going to be replaced by now an agentic runtime and that is the modern application an agent. Ladies and gentlemen, Nvidia's RTX Spark laptops. Now, thank you. I have too many things in my pocket. Okay. All right. This is the most amazing chip the world has ever built. This is the N1X that we built in partnership with MediaTek. I think I saw Rick earlier. This is N1X. This is a beautiful chip. This is a chip that frankly would take 33 years to build. And the reason for that is because 100% of NVIDIA software stack runs here. If you want to run digital biology, no problem. If you want to do seismic processing, no problem. You want astrophysics, no problem. Everything associated with CUDA, all the physics, all the biology, all the genomics, all the AI, no problem. All the computer graphics, no problem. Every single application Nvidia has ever created and every single application that Windows has ever run, Microsoft and Nvidia meticulously optimized everything so that this computer literally runs everything the world has ever created. Plus, it now runs agents. An incredible computer. I'm so proud of it. Now that computer could have a local Neotron 3 Ultra model or Neotron 3 super model or it could have a cloud code or codeex or some other model in the cloud or something on the network and it's going to work and do something amazing. RTX Spark is a reinvention of laptop, but in fact, Microsoft and Nvidia is reinventing all of PC. And today, we're announcing a whole new line. Three revolutionary Windows machines, covering desktop, laptop, and workstations. All 100% Windows compatible, 100% CUDA, 100% Nvidia AI Tensor Core. Everything that runs that you see that runs on Nvidia in all these different platforms around the world runs here. And so we have a roadmap for this. This is a brand new product family for us. Every single generation of architecture. We will have a desktop, a laptop, a workstation, and then a desktop, a laptop and workstation. And the thing that I am just incredibly pleased, incredibly honored is that 100% of the world's PC industry has joined us to reinvent the PC. A new line, a new beginning. Thank you. In the case of language models, all the English and all the language that we have on the internet that we trained on was from the perspective of us. We wrote it and we're reading it. However, in order to create data for AI, robotics, it has to be in the perception, the perspective of the robot. And most of the world's video data is from a third person, not first person. And so, agentic systems, robotic systems, physical AI, the data is the hardest problem. You've seen us move up this ladder. We started with tea operations which is basically human demonstration. This is no different than the big breakthrough of reinforcement learning human feedback. This then we use simulation. This is where omniverse comes in. This is no different than reinforcement learning verifiable rewards. Okay. And so we use these systems to bootstrap the AI model, the physical AI model. Eventually we're able to learn from third person reprojecting it into first person and now eventually through bootstrapping we have a world foundation model that can understand the physical world from any perspective you want third person first person outside in inside out doesn't matter. This is a big breakthrough indeed and today we are announcing Cosmos 3. Cosmos 3 is the frontier of physical AI. We are at the frontier with language models. There are so many people working on it. However, in physical AI, we are absolutely the world's best. I am so proud of the team for doing this. This is the foundation model for all of your work. Whenever you want to create a robot, whenever you want to create a factory robot or a robot that works in a factory, any kind of robot that involves physical world, you now have a companion, a Cosmos 3 that can understand and reason, it can generate, it can simulate in the loop, it can even be the policy itself. It is on the top of leaderboards all over the world. I am incredibly proud of Cosmos. And today we're announcing Cosmos 3. It takes data plus compute gives you AI. Now that we have AI, compute is data. And so use Cosmos 3. Train a whole bunch of AI models. Cosmos is such an incredible open model system. It's exactly the same as Neotron. We open the model. We open the data. And we even opened how we trained it so that you could enhance it for yourself and turn Cosmos into your proprietary model.
Today, we're announcing Alpha Mayo 2, an open model for self-driving cars. We're working with car companies across the world. If you look at these brands that have signed up for the Nvidia Hyperion that are building Nvidia Hyperion cars, this represents about 80% of the world's cars. The manufacturers represent 80% of the world's cars. We are going to have a whole lot of Nvidia Hyperion systems that are able to run Alpha Mayo or anybody else's AV stack. We are also connected into mobility services. Approximately 97% of the world's mobility services are connecting with us. So that when we deploy Alpha Mayo on the Hyperion runtime with the Halos operating system, we will be able to connect to all of these services across the world. The NVIDIA Isaac Groot is our humanoid robotic stack model, data generation, simulation, the runtime including the operating system. This represents the Isaac Groot platform. Every one of our systems as you can see the exact same pattern whether it's agentic system for the cloud, agentic system for the PC, a robotic system for a self-driving car, a robotic system for a human or robot, all the same. And of course, in every single case, we build everything completely. We build everything vertically, completely integrated with code design, extreme code design, and then we open it up for everybody to use whichever part you like. And whatever you want to use, we even help you modify. But the one thing that is missing is we need a reference platform for robotic systems. These robotic systems are so complicated. So many motors, so many sensors, so fragile. And yet we need to have a way to deliver these reference platforms. Just like we do with PCs and DGXs and clouds and self-driving cars, we now are going to do it for robots. Today we're announcing the NVIDIA Isaac Groot, a reference humanoid robot, all fully integrated. 25 degrees of freedom on each hand made by Sharpupa. 31 degrees of freedom on the robot. 6 feet 150 lbs. Just like me. The first number is shorter. The second number is bigger. Otherwise, pretty close. And this platform runs the new Thor and our entire software stack, data generation stack, data simulation stack, the runtime, all integrated into a robot that is designed for everyone to use. Now, we built this for higher education and university researchers because for them to build this is insanely hard to do. The computer industry has been completely changed in the last six months. Everything changed. Everything changed because agents were realized and it converged with the latest frontier models and it made possible the AI to now do useful work. The computing pattern will repeat over and over and over again. This computing pattern of an agent that's a model, a harness that uses tools with skills and runs in a runtime. That runtime depends on whether it's in the cloud or on prem on a PC or in a robot. But the computing pattern is exactly the same for all of them. You will use different harnesses because of your preference. You'll use different models because of your preference. You will improve them for your proprietary use. You would create sub super agents that you can rent to other people to help them do their work. This agentic platform, this agentic pattern, Nvidia has an enterprise AI toolkit. This is a wonderful way for all of you to engage AIS and for us it's a wonderful growth opportunity. Vera Rubin is in full production whereas Grace Blackwell was created to process AI particularly inference. Vera Rubin was created to run agents. It is in full production. It is much much more than a GPU. It is an entire disaggregated distributed agent processing system. Nvidia has really become an infrastructure company. Not just a GPU company, not just a systems company, but an infrastructure company to help you generate the maximum revenues, the maximum profit, and to get there as soon as possible. The agent world, this new way of doing computing where you build CPUs now for agents, not for people, CPUs for agents, has its own special requirement. And our Nvidia Vera is revolutionary. I'm so happy about its ramp, the orders already. It's going to make it the fastest and the most successful product launch in our company's history. Nvidia and Microsoft has created a whole new line of PCs. This is a new beginning. And of course, that exact same agentic computing pattern that I just described is also going to run on all kinds of devices. I mentioned PCs, but in the future, it'll be robots and satellites and base stations and factories in the cloud, on prem at the edge. This pattern, agentic AI system, this agentic computing pattern will be replicated in computers all over. How we think about the personal computer will very likely change. I want to thank all of you for your partnership, your friendship. We couldn't be here without everything that we do together. I am so proud of how you've been so successful this last year. The next year is going to be even more. Very optimistic. Stanford.