Rita Coslov25:30
Thank you, Phil. Hello everyone. My name is Rita Coslov. I lead product for Cloudflare's developer platform. I'll share a little fun fact with you. Just a couple of days ago, I got to celebrate my 10th anniversary at Cloudflare. And I'm being very sincere when I say that even though it's been an incredible 10 years, there hasn't been a time that's been more exciting to be at Cloudflare than right now. Now, I like to start by talking about Cloudflare's mission for developers because that's the reason that we're doing all of this. That's the reason that I get up out of bed in the morning. And it's to provide developers with a modern cloud platform that accelerates their velocity and enables them to build better end-user experiences. In other words, we want to help developers bring their ideas to life. And we want to make it as easy as possible from the first line of code that they write to the first million of users that they get and the millions to come after that. The other thing that I like to start with is the big bold non-consensus bet that we made nine years ago when we started on this journey of building a developer platform. The reason this was such a big bet is because every single cloud that's been built before us has been built on the foundation of virtual machines or containers. But we decided to take a different approach with a technology called isolates. And this came with a big trade-off. We knew that it meant that it would be harder to lift and shift applications onto Cloudflare. But it also meant that if you were building something new, if you were building a green field application, we could actually make that really easy for developers and remove the burden that we saw come with the clouds where they still had to worry about infrastructure and regions and scaling and performance. And when you're making a big decision like this, you always want to bet on the future, not on the past. And it's a good thing we did because the green field opportunity today is so much bigger than it's ever been. AI is making writing code incredibly cheap. And that means that there's a whole explosion happening of applications that are being built every single day. And the whole calculus around lifting and shifting applications is changing too because organizations are no longer stuck with these decisions that they made 10, 20, 30 years ago. They can actually choose to rebuild something and not have it be this ambitious project that never completes. These are real conversations that we're having with our customers right now. Lastly, developers are feeling really inspired because there is this whole new use case for them to build around agents. Now, this is my third time up here, and I feel like this is your opportunity to hold me accountable to the things that I promised, the predictions that I made a year ago. So, let's see how some of them have played out. The first thing that we talked about was how AI workloads were going to shift from training and inference to end-to-end automation or agents. And guess what? That's exactly what we're seeing unfolding right now with tools like OpenClaw, which is an open-source consumer agent growing so quickly in popularity that overnight it surpassed React. React, which has been the biggest front-end framework for the past 10 years. That's how much demand there is for agents. The next thing that we talked about was how in the next 5 years more code was going to be written than in the entire history of software that came before it. And again, we're well on track to surpass that goal. Every day this year, there are more websites, more new iOS apps being built than the year before. There is so much code being pushed up to GitHub every day that GitHub is barely able to keep up. And we're seeing the exact same thing from our users with Wrangler downloads being up nearly 1,000% year-over-year. Last but not least, and perhaps most importantly, we talked about how Cloudflare was going to become the go-to platform for developers to build and deploy these AI applications because Cloudflare could uniquely offer developers the best of scalability, performance, and developer experience. And again, you don't have to take my word for it. You can take the word of 5 and a half million developers that are building on Cloudflare today. And that's not all. Just a week ago, we announced the acquisition of a company called Void Zero, the company behind Vite, which has been one of the fastest growing developer tools that we've seen over the past couple of years. The reason it's so popular is because it powers the technology that underlies almost every framework out there with the exception of Next.js. And what's been incredible to see is that as Vite has been growing in popularity, well, so has Cloudflare's plugin that's been growing with it, just recently representing as much as 10% of total Vite downloads. But okay, this is all a year ago. So now that hopefully I've earned a bit of your trust, let's talk about what's going to happen next. And I'm going to start with making a small adjustment to something that I said a year ago. While yes, automation and agents are going to be the next phase of AI, what we're actually seeing happen is that this is occurring in two distinct phases. First, the models have to get smart enough to be able to actually operate these agents end to end. And what's going to happen next is all of these agents are going to need an execution environment to operate in. So let's dive into that. I'd like to start with a recap of how agentic systems even work. Agentic systems are made up of three main components. The AI piece. This is the model or the brain of the operation that's going to come up with a plan, constantly re-evaluate it to make sure that it's continuing to deliver on a goal. There's the workflow aspect of it that's going to make sure that the next steps are actually being taken every single time. And there are the APIs. These are the tools that the agents have at their disposal so that they're not just making plans, they can actually action on them. Let's start with the first component, AI, which is where we've seen so much improvement happen over the past year and really even over the past six months alone. Since November December, we've seen a massive step function improvement in the productivity that models have been able to give to agents with models like Opus 4.6, GPT 5.5 representing a massive unlock in agent productivity. And what's been remarkable is that open-source models have actually been very quick to follow here with models like Kimmy 2.6 and GLM 5.1 being just behind them. The good news here for Cloudflare customers is that they can always get access to the latest models. And this is especially important when new models are coming out so quickly that you don't want to be locked into a single vendor. Through Cloudflare's AI gateway, developers are able to access hundreds of models in a single place. And if you're choosing to run one of the open-source models like Kimmy, you can do so on Workers AI in a truly serverless way where you don't have to pay for pre-provisioned resources that might go unused. You can actually only pay for the inference that you end up using. But now that that's behind us, let's talk about the more interesting piece, which is the workflows and APIs. And this is where if you've used an agent recently and you found that while yes, it is really incredible, you can't quite leave it alone entirely unattended, you can't fully delegate a task to it and trust that it will fully get done. Why is that? I'd wager that the main reason behind it is that the agents absolutely must have the ability to write code. Now, that's a bold statement. So let me explain why. This all starts with something called the context window. The context window, you can think of it as the memory of the model, is everything that the model is able to hold in its head throughout the transaction of an agent. If you hear model labs announcing new models, often announcements start with this. 'We've extended the context window.' Now here's the thing about the context window. The context window of the largest models today is about 1 million tokens. But into it, you need to be able to fit in all the tool definitions that tell the model what tools it has access to, the conversation, the system prompt, the tasks. By comparison, Cloudflare's API surface alone to tell a model everything that Cloudflare can do, you need 2 and a half million tokens. More than double what today's biggest context windows are. And this is where you start to see weird things happen like the lack of accuracy that you see when agents are operating. For example, just this morning I asked my agent to create an event for later tonight for dinner at 6 p.m. at La Mercerie. Great restaurant, by the way, if you haven't been. And well, upon first glance, it seems like it successfully created the event. When I look closer, I realize that it actually created it for January 1st, 2024, which is not today's date. Why? Why did it do that? Models have no conception of what today's date is. The way that they obtain that information is through doing tool calling, but we just talked about how you have to be very conservative with what tools you include in your agent because of how quickly it blows the context window. So agents start to come up with all these inefficiencies and inaccuracies. What this really comes down to is that agents are not good at tool calling. Hello. Here we go. LLMs are bad at tool calling because there's no such thing as tool calling in nature. Tool calling is actually an entirely artificial process. Tool calling gets injected into the model after the model has already been trained. It's not actually code, which is where models start to get easily confused. If you have two tools that have similar names, they can conflate them. They're very inefficient at using these tools, which ends up using a lot of tokens, which adds up to a lot of cost for our customers. And this is not just me saying this about the model. The model labs know this about tool calling. This is coming directly from Anthropic. Asking LLMs to do tool calling is a bit like asking Shakespeare to take a month-long course in Mandarin and then asking him to write a play in it. You know, it's Shakespeare, so it will be good, but it's not going to be his best work. Now, what LLMs are really, really good at, what they're exceptional at, and have been trained on lots and lots of, is code. An LLM can write code extremely efficiently and quickly. An LLM can write a few lines of code to figure out today's date in a matter of milliseconds. But if LLMs are good at writing code, that means that they now need a place to be able to execute and scale all of that code. Well, if they need a place to scale and run all that code securely, why not just do that the same way that we've scaled and run applications all of these years, which is why not just run them on the hyperscalers? Well, the reason for that actually sits behind the way that we scale applications today. So, let's talk about that for a moment. Every application generally begins as a monolith. As a developer, you write that single application. And actually, after you write that code, it scales pretty well to the first hundred, maybe even first thousand users, each of them coming in and running through that code that you wrote. But eventually, you get that thousand and first user, and that monolith no longer scales for that. So what do you do? You create more copies, more instances of that exact app. For the past 20 years, when we've been talking about scaling applications and we've been talking about microservices, this is exactly what we're talking about. Running the same code over and over again. And when you need to make an adjustment, you write that code, you thoroughly review it, and you release it to your users. Agents don't work like this. Agents don't need more copies of the same code. Agents need a distinct new piece of code written not just for every user, but for every single task and sometimes even subtask. What we've been doing simply doesn't work here. To use an analogy, the way that we've been scaling applications is actually very similar to the way that fast food restaurants work. You have a chef, they come up with a menu, maybe they update it every couple of months, and you have this kitchen that's been designed to very efficiently execute on that exact same menu, producing hamburgers, fries, and milkshakes. Now, when you have a lot of customers, how do you scale as a fast food business? Well, you create more franchises, right? More copies of that kitchen designed to produce even more hamburgers, fries, and milkshakes. Now, the really incredible thing about the year 2026 and the time that we live in is that we accomplished what we previously thought was impossible. Now, every person can have a personal chef in the form of an LLM that comes up with a menu fully tailored to that user and what they want on that given day. But we're still stuck with that exact same kitchen that's only optimized for executing on hamburgers, fries, and milkshakes. What you really want in this situation is something more akin to a pop-up kitchen where you can quickly spin it up, have just the tools, just the ingredients that you need, and be able to produce that meal on demand, quickly spin down that kitchen and execute on the next one. Now, it's not just conceptually that this doesn't scale. The math shows it, too. Let's run through what it would take to power agents for the entire United States alone. There are about 100 million knowledge workers in the US today. Now being conservative, let's imagine that we run one agent per person and you can fit about 10 agents per CPU. If you're running containers, you need about 10 million CPUs to be able to power all of that. For context, current global CPU server production is about 35 to 45 million a year. So to power the US alone, that's about 20 to 30% of global CPU production. But we're not alone in this world, right? So now let's run the math globally. In the world, there are a billion knowledge workers. And realistically, we're not running one agent per person. They can run very efficiently in parallel. We can run two of them. They can run when we're asleep, when we're at dinner. Now, if you do the math, you need a billion server CPUs. That's 20 times current global production. If you think about the phases of AI adoption, each of them has been defined by a different bottleneck. The training era needed consistent GPU resources for training. For inference, you needed flexible GPU resources to be able to adjust to unpredictable traffic. The first phase of agents, the bottleneck was models getting smart enough. And the next bottleneck is going to be on access to these CPU resources. And the cloud that got us where we are today simply doesn't get us to where agents need to go. But you know what does? Remember that big bold bet that we made 9 years ago on isolates? Well, this is exactly what isolates were built for. Isolates are perfect for this moment because unlike VMs, isolates don't need to bring in an entire operating system every time they need to run a line of code, or unlike containers, they don't need to spin up the entire language runtime every time they need to run a line of code. They can actually just bring in the application code or the code that's been written by an agent and execute it on demand. This means that we can be a hundred times more efficient than any type of compute out there, being able to import the code that the agent just generated and move on to the next one. And it's not just about our efficiency. This results in meaningful cost savings for our customers, too. I'll start with a counter example actually where if you have a regular application that maybe only grows 10% year-over-year and you've really invested in optimizing your one server running on Cloudflare and spinning up a new worker, renting it every time that you need to run something might actually end up being about 15% more expensive. But when it comes to running agent generated applications, well, if you need to run about 10,000 of them, that's going to be about 63% more effective on Cloudflare. If you need to run a million agents, that's going to be nearly 75% more cost effective on Cloudflare. And at the scale that our customers run at, this adds up to really significant cost savings for them. Now, it's not that the cloud got everything wrong. In fact, Andy Jassy was absolutely right about one thing, and it's that it is all about primitives. It's just that the primitives that got us here are not the primitives that we need for the agentic era. If the last era was defined by primitives like S3, EC2, SQS, the next generation of application is going to need primitives that look a lot more like durable objects, artifacts, workers, and workflows. And this is not just me saying this. Our customers are starting to think about this already too. Just recently I was talking to Mark Smith who is the head of infrastructure at Discord, one of our customers, and he was telling me that, you know, he said, 'I know what hyperscalers are going to look like in 10 years. They're going to look exactly the same as they do now. I'm looking to Cloudflare to define what the next generation cloud is going to look like.' Because like us, our customers want to bet on the future, not on the past. And we're about to live through a major generational shift on the web. The last one of these that we saw was mobile. And while the cloud preceded mobile, it was actually mobile that made the cloud take off. Right? Before mobile, we could only access the web when we were tethered to our computers. But the smartphones made it possible for us to access the web anywhere we went. And these new devices were so much cheaper that they became accessible to a larger portion of the population, making internet traffic explode. We're about to see the exact same thing with agents. We're already on track to surpass human traffic with bot traffic, as Matthew was just saying before, earlier than we even expected. Stephanie is going to talk about how unlike humans, agents never tire. We get bored after clicking on five links, agents can look at thousands of links before they reach a conclusion. Traffic on the internet is about to take off and is going to need a different type of cloud to be able to power it. And if the iconic companies that were born out of the mobile native era are Airbnb and Uber and Netflix and they were all built on clouds like AWS, well the next generation of iconic applications, these agent native businesses, are going to be built on Cloudflare. And this is not a prediction. This is happening right now. Look at the past month of announcements alone. Figma, Lovable, Anthropic, OpenAI, the world's fastest growing agent native companies are already betting on Cloudflare. We're building the fastest, most secure, most cost-effective place to build, deploy, and scale agents and the code that they write. And I'll leave you with this one final thought. There are 8 billion people on this planet and we're starting to rely on agents not just for our jobs, but for our day-to-day lives. We're going to need multiple agents running per person. And these agents are going to generate infinite lines of code. And we can't just build more servers to scale our way out of this. We need a cloud with a fundamentally different approach. And Cloudflare is the only cloud that's going to be prepared for this next agentic era. Thank you.