Fidelma Russo1:22
Good morning everyone and welcome. It's such a pleasure to be here in Las Vegas with all of you. The lights are bright but the ideas are brighter and the pace of change in technology somehow still manages to keep all of us on our toes. We're in yet another technology transition. Now I've been fortunate to live through many of these, not all of them in my career. And this one is a little different. Every wave that we have been going through has changed how we build, operate, and deliver technology. But as I said, this one really feels a little different because the pace of change is extraordinary. The distance between breakthrough and expectation has actually almost disappeared. New capabilities are emerging overnight and they very quickly become business priorities that we all have to execute to. So the question is no longer whether AI will transform the enterprise. The question is how we make that transformation secure, governed, scalable and operational. So for decades static enterprises operated around static workflows. We had information moving through the process. Humans made all the decisions. Systems supported the work. But today's enterprise operates across fragmented silos. We have distributed data. We have distributed applications. We have distributed intelligence. And we're moving from systems that support decisions. In fact, we used to call them decision support systems. And we're moving to systems that help execute decisions. So, we're moving from static workflows to systems that can observe, reason, and act. Because AI isn't just another application. It's becoming part of how we get our work done now. It's showing up everywhere. It's showing up across applications, across our workflows, across our teams, across our infrastructure. And what starts as a co-pilot quickly becomes a network of intelligence systems and agents working together. And when intelligence is distributed, the enterprise changes too. And we are now in an era of distributed agentic enterprises. So for years we have talked about distributed infrastructure and with AI intelligence itself is becoming distributed. So every decision, every recommendation and every action generates inference and that creates a whole new source of demand across the enterprise because creating intelligence is becoming easier. Coordinating intelligence is where everything gets really more interesting. And this means that the real challenge becomes how do you govern intelligence when it's distributed? How do you coordinate it in different places? And how do you operate it across an environment that's becoming more dynamic, more distributed, and increasingly autonomous? So the answer is closed loop operations. And we need systems that continuously observe what's happening, reason across all of the signals that they're getting, take action, and validate outcomes. Not through static workflows, not through manual handoffs, secured, governed, and continuously reasoning in real time. But before these systems can act, they need the right data, the right infrastructure, and the right operating model. And they need to all be built on a platform foundation so they can coordinate and act together. So let's start with the data. Now, we've talked for a long time that every AI strategy needs a data strategy. And as AI moves and it's critically important this movement from prompts to agents, data becomes more than just an input. It becomes a critical part of the operational system. So unlike traditional AI which retrieves information and generates a response, agentic AI coordinates, reasons, takes action and it acts as part of the operational process. And every single one of those interactions depends on data. So data is no longer accessed once, it is accessed continuously throughout the life cycle of a task. Now before AI can act on that data, data must be discoverable. It must be governed. It must be secured. And it must be accessible wherever those agents operate. And HPE data fabric creates that trusted data layer for AI. It makes governed enterprise data available wherever intelligence operates either across your data centers, across your clouds or across your distributed enterprise. It has automated controls and workflows. It has identity management and it has governance built directly into that fabric. So you can shorten the time from deciding to deploy AI to getting AI into production. Today we are really pleased to announce HPE Data Fabric 8.2 and this helps agents discover, understand, and govern enterprise data more effectively. So with new agent-aware capabilities and an enhanced and simplified global data catalog and also available in an appliance that experience simplifies your experience and decreases your time to value, HPE data fabric is one of the trusted foundations for your data layer. So, we've all looked at when we deploy the data, what is next? And one of the emerging top-of-mind topics for AI is now the economics associated with it. Tokeneconomics. And so once agents have continuous access to data, every interaction consumes a token. That includes every decision, includes them validating the decision and includes them taking the action. So unlike traditional AI, agents don't stop after one response. They continuously reason, they continuously coordinate, and they continuously interact with other systems. And what that means is that inference is a continuous operational workload, not a one-time request. So this brings us back to economics because every time an AI system reasons, validates, acts, or takes action, it's consuming a token. And that consumption adds up really, really quickly. So what looks like a simple prompt on the surface can become thousands and millions of model interactions. We're already seeing this in real world environments. Public information tells us that OpenClaw showed more than 600 billion tokens processed in a single month to support roughly around 100 continuously operating coding agents. And that works out to roughly $13,000 per agent per month. So suddenly AI economics looks a lot like infrastructure economics. It comes down to utilization, efficiency and scale and how well we operate the full system and not just the model. We saw this firsthand at HPE. Every day our storage systems and our support systems process billions of operational signals coming in from our customers and as those environments became more autonomous and as we looked at those systems our token consumption scaled with the amount of signals. So our engineers, really smart people, built an AI first support platform codenamed Mindstone and we built it on prem with Green Lake intelligence and private cloud AI and running AI on our own infrastructure gave us control over the economics. It allowed us to govern that really important customer data and it gave us better performance and it also helped us significantly minimize the token spend associated with operating AI at scale. We stopped being consumers of AI and we became producers of intelligence. So let's look at the results. We had more than 30x lower cost. We had nearly $100,000 saved per month. And that gave us the capacity to scale even further faster. So that is an example of how we took all of our infrastructure, all of our intellectual property and put it to work within HPE for the goodness of our teams and also to react to customers faster. Another great lever in the cost equation is memory. Every agent, every workflow, every inference depends on context. And if that system has to rebuild context every time, it wastes compute, it burns tokens, and it slows everything down. So in AI memory is no longer a technical detail or a supply chain challenge. It is a strategic resource. And that's where KV cache changes the economics. So instead of rebuilding context every time, the system can already remember and use what it knows. And what this means is that storage becomes active memory for AI. And that is exactly what HPE Electra storage X10K was designed to address. Not simply to store data, but to keep data context and intelligence available wherever AI needs it. So what does this do in reality? It reduces the time GPUs spend waiting for data. It increases the amount of useful work your infrastructure can perform. An X10K turns storage into an active part of AI efficiency and is a key contributor to the economical value of AI. So did it really make a difference? We tested this with real AI software, real infrastructure, and real enterprise workloads. We compared the X10K with KV cache and RDMA against traditional AI infrastructure architectures without them. And the results were significant. 20 times faster to first token, which means that the same GPU spends more time generating value and less time waiting on the data it needs. But what does that 20 times really mean in practice? Let's take a look. Seconds matter for fraud detection, for diagnosis, for quality control, for every interaction with an AI powered business process. Dozens of AI agents are simultaneously evaluating risk, reviewing transactions, generating forecasts, validating compliance, and supporting critical business operations. Every request requires context, and as those agents become more sophisticated, the demand for memory grows dramatically. Eventually, GPU memory becomes the bottleneck. When that happens, agents spend more time rebuilding context and less time serving users. We often talk about 20 times faster time to first token. This is what that means. What you're looking at is a financial services agent factory. On the left is a traditional AI infrastructure stack, the same financial service agents, the same models, the same NVIDIA H200 GPUs, but every request must continually rebuild context after it falls out of GPU memory. On the right, HPE Electra Storage MPX10 extends KV cache beyond the GPU using high performance object storage and remote direct memory access. The context is already there. No rebuilding, no waiting. The traditional environment struggles to keep up as agents wait for context to be regenerated. The X10000 environment continues serving requests, keeping agents productive and GPUs fully utilized. This is KV cache offload in action by extending memory beyond the GPU and eliminating unnecessary recomputations. HPE Electra storage MPX10 fundamentally changes the economics of inference at scale. Validated by Kamoaza, the X10000 delivers up to 20 times faster time to first token and 17 times higher throughput. Not by adding more GPUs, by keeping the GPUs productive. This isn't just faster inference. It's the ability to serve more agents, process more requests, and deliver more business value with the infrastructure you already own. HPE doesn't build the GPUs. We help them spend more time working and less time waiting. So I think what you just saw was not a storage demo. It was a glimpse of AI economics in action. So for all of us, we know that our AI success won't be measured by the number of models you deploy. It will be measured by the intelligence that you deliver. So we started with data and data is the foundation of every AI initiative. But data alone doesn't create outcomes. Every prompt, every inference, every agent action ultimately depends on your infrastructure. And once AI becomes operational, the question is, is your infrastructure even fit to run it? Now, there's a misconception that AI shrinks traditional infrastructure demand. It doesn't. Agents multiply that demand because they're reasoning and inferring mostly on GPUs, but sometimes on CPUs, they trigger questions all around them. They trigger database queries. They trigger APIs. They generate new workflows. They make new automations. And so every agent that you deploy has a ripple effect. You need more GPUs. You need more CPUs generating demand across your entire estate. AI doesn't shrink your infrastructure, it expands it. And that's why we are really proud that we have built the industry's most complete private cloud portfolio, letting you run HPE virtualization and containers, VMware, Red Hat, and AI workloads under a common operating model deployable from remote offices to mission-critical environments from traditional applications to AI. And because we renamed them in simpler because they are really now a family. We have the SimpliVity PC 1000 which delivers a simpler more resilient hyperconverged infrastructure for remote and distributed locations. We have the PC 3000 which used to be PCBE delivering a turnkey private cloud for modern workloads. And now like the rest of the private cloud portfolio, it can be deployed in highly secure and airgapped environments as well as cloud connected. And PC7000 extends that model to mission-critical and regulated deployments, including environments that need DoD IL4 compliance. And hybrid and HPE private cloud AI continues to scale with support for up to 256 GPUs today and upcoming support for Nvidia Vera CPUs. And across the portfolio, every hybrid, every private cloud solution is powered by HPE Morpheus. This gives you a common control plane, common operating model and governance framework from edge to core to AI. Now, two years ago, we introduced HPE private cloud AI co-engineered with Nvidia. The goal was simple: help enterprises move from AI experimentation to production. Today, that journey has evolved. Organizations, we're all deploying agents. We're scaling inference and our AI is moving deeper into our day-to-day operations. And this is exactly why HPE private cloud AI matters now more than ever. It gives enterprises the foundation agents need to run in production. Now, agents, as we heard, they depend on continuous access to data, context, and memory. And as we just saw with KV cache, remembering context changes the economics of AI. And this is why we are bringing X10K directly into HPE private cloud. So agents can run faster, so GPUs can stay productive and so enterprises can get more value from every AI investment. Now building an AI native enterprise requires a lot more than just deploying agents. You need a way to govern them. You need to secure them. And really, you must trust them. And HPE AI Essentials, which is embedded in private cloud AI, provides a workbench for building and deploying those models. And now we support secure agent operations with Nvidia OpenShell and Nvidia Nemo Claw, which provides the governance and security layer. Together, they help organizations build, deploy, and operate models with confidence. And now, let's take a look at how private cloud AI helps accelerate your AI journey.