Dan Guido0:00
So, I wanted to give a little bit of a brief overview of the journey that Trail of Bits has been on over the last few months because I think it's a journey that many other companies are trying to go on themselves right now. And we've learned a lot of hard lessons along the way that I'm able to now share with you. A lot of firms, they want to unlock the benefits of AI, but they're finding a lot of difficulty achieving that. And I think the number one thing standing in the way tends to be people, not technology. So I want to explain how that played out inside of an elite consulting firm like Trail of Bits because I think the lessons are reproducible across many other firms.
So this article, it came out about two months ago. There were thousands of CEOs that were surveyed and the result was that AI had no measurable impact on employment or productivity. Economists at the time were calling it the new Solow paradox, which is a reference to this term by Robert Solow in 1987 who said that you can see the computer age everywhere except in the productivity statistics, meaning that it was really not having the impact that people thought it was. And I think the same thing is happening with AI right now. There's a few elite firms and groups and people that are able to unlock the benefits of AI and then there's everybody else. So obviously I don't think AI doesn't work. It does. I just think that most companies are doing it wrong. They usually give people tools without changing the system. Like everybody gets a ChatGPT license and then leadership waits for the productivity numbers to move, but they don't. So that's what I'd call the gap between AI assisted and AI native. On one hand, you've got a suite of tools and on the other hand, you've got an operating system that changes the way your company works. So, the rest of the talk is a little bit about what it took to build parts of that operating system for Trail of Bits. I won't say that I have all the answers, but I think I've discovered some.
So before I go on, I want to just define what we're even aiming for because AI native gets thrown around a lot and it means different things to different people. So the way I think about it — got to, I'm Italian, I talk with my hands, got to keep control of my hands here. So the way I think about it is there are three levels. And most companies are stuck at level one. That's where AI assisted lives. That's where you give people access to ChatGPT, they use it to draft emails and it's a productivity tool. But overall, the company doesn't change. Workflows don't change. You just kind of do the same things a little faster. AI augmented is where you start to get to have some fun. That's where you redesign workflows. You're not just using AI as a tool. You're putting agents in the loop. So you might have AI do the first pass on a code review and then the human does the second, but the process itself is still the same. So AI native is a structural shift. This is where we design the organization from the ground up assuming that AI is a core participant. So it's not a tool you pick up but it would be a teammate that's always there. You have knowledge management, your delivery model, your expertise all designed to be consumed and improved upon by agents. So by Trail of Bits, what we mean concretely is that our security expertise compounds as we do audits. So every engagement, the skills, the workflows, all the stuff that we work on makes the next engagement faster. So every engineer operates with an arsenal of specialized agents built from 14 years of audit knowledge. And that's not really using AI, that's AI is on the team.
So I had a lot of trouble getting there. And the trouble that I had was people related. It was not technology related. I knew what we had to build. I knew that the team was capable of it. We have some of the best researchers on the planet that work for Trail of Bits. And you guys don't have to take pictures of these. I will absolutely publish this and I think many of the slides are already out there. But this follows similar trends in technology. Sometimes it's the smartest, most capable people that resist changes in technology the most. People's unwillingness to accept something else might be better than their intuition is a powerful sort of obstacle to progress. So basically I pulled up these four psychological biases that are well studied in literature that I think play out here the most.
The first is your self-enhancing bias. That is the tendency to perceive yourself in an overly favorable manner. That means you're trying to maintain your self-esteem. You take credit for your successes, but you blame external factors for your failures. And this gets worse with seniority. The more work and time that you have put in to gain some expertise, the more you believe that your intuition got you there. The more unique you think you are, the less you think your skills can be replicated by a machine.
But you're wrong. There's also the identity threat. A lot of people, in our field, I think we have a high degree of identity that's wrapped up in our security work. We identify as a security researcher. So there's lots of studies. There's a really good one from — oh I forget where it's from — but there's a great classic psychological study where a group of people are given a cooking machine, some kind of cooking kitchen automation device, and they're split into two groups where one group is advertised the device as 'it does the cooking for you' and the other group is advertised the same device as 'it helps you cook better.' Same device, right? But people who identify as cooks or chefs, they reject the first framing, but they accept the second one. And it's just all in how you frame it.
Another way to look at this is the opposite. You could take a look at things that are more symbolic in nature like tattoos. They embody a human craft and you probably would not accept a tattoo from a robot because that destroys all the symbolism in it. But you would accept the removal of a tattoo from a robot because there's no symbolism there. It's a mechanical process. So security auditing and security work is deeply symbolic for most of the people practicing it. So you have to frame AI as something that makes you a more dangerous auditor or more capable auditor, not as something that does the audit for you. That's where people start to lose you.
We've also got — let's see, skipping through these a little quicker — opacity. This is, you know, human subjective understanding of human judgment — we believe it to be high. Like when a doctor explains your complicated medical issue to you, you're like, 'Yes, yes, I get you. I totally understand all the words that you're saying.' But you don't. And if you took the exact same words that a doctor provided to you and put them into text and delivered it to you that way, you would not trust that system at all. So I think the fix here — sorry, yeah, people feel like they can understand how a doctor diagnoses but they can't really explain it. And it sort of disappears when the human disappears from the process.
So what you typically do to fix this is you provide some kind of comparative reasoning when you get a mechanically delivered result. Like if you get a report that says you have skin cancer from some AI bot, you don't just give somebody the result that says you have skin cancer. You actually show a bunch of moles alongside the one that is found on your body and show comparatively why we believe this result to be true. Like why is this not benign? You give some kind of comparative reasoning and then the trust equalizes.
And then finally there's a bias for intolerance for imperfection. In studies, if participants watch an algorithm outperform a human forecaster — so you've got two people, one's an AI, the other one's a human — the AI is so good at its job, it always predicts the weather perfectly, but then it makes one mistake, you fail to trust it ever again. Humans are intolerant of that failure when it comes from a machine, but they are tolerant of failures when it comes from a human. So the fix here is usually to allow people to slightly modify the algorithm. And when you have even a single adjustable parameter, that's enough to overcome the aversion to this intolerance for imperfection.
So how do we address these in the deployment model for Trail of Bits? So for this with a self-enhancing bias, we created a maturity matrix. The maturity matrix was pretty controversial internally. Nobody likes to be told they're at level one. But that's sort of the point. You can't argue that you're already good enough if there's a visible ladder. So it makes the conversation concrete instead of 'I don't think AI is useful.' It shows you how much deeper you can go. And it's also a marker of progress for your peers. So if you see somebody reach level two or level three, you start to get a little bit of FOMO and say, 'Oh, they got it. They did it. How come I can't do that?' So that's where the passive collection of people in the company start to move.
You've also got the identity threat. So this was the hardest one that we spent the most time on. And the key insight is that people accept skill-allowing automation, but they reject skill-replacing automation. Same tool, different framing. So we never stopped asking people to be security experts. We just gave them a new way to express that identity. We created skill repositories that people in the company could contribute to. And when they contributed their expertise — like we had a senior engineer contribute a constant-time analysis cryptography skill — they're not being replaced for that skill. They're actually just becoming a permanent fixture within the company where everyone gives them credit for producing those innovations for anyone that's stuck on an audit.
And the maturity matrix reinforces that. So on the highest level of our maturity matrix, if you're at level three, it's not somebody that uses AI the most. It's somebody that invents a new way to use AI and builds tools that use AI. So the identity of the expert shifts from 'I don't need AI to do my job' to 'I'm the one who makes AI dangerous, not just for me but for the whole company.' We also did hackathons instead of mandates. We didn't force any adoption. We wanted to allow people to have a space to experiment on their own terms.
For the intolerance for imperfection, we created paved roads. So we invested heavily in reducing ways that AI could fail embarrassingly. I don't want to have stories on our Slack about, 'Oh, I tried to use Claude and it did this totally wrong thing.' And a lot of these tools when they're configured out of the box, they just don't work right. So, we provided known good defaults that adhere to the Trail of Bits way to use these agents.
For opacity, people didn't know what was safe to use. They didn't know what was approved. They didn't know what data they could put where. So, we made an AI handbook that made it concrete. And we didn't just describe what was approved. We also described why. We described like here's what's not, here are the exceptions, here's who to ask. And we didn't just tell people like 'hey trust us.' We explained the reasoning behind it. So once people got familiar with the risk model, they were more willing to use the tools on engagements.
Then finally, we made adoption visible and fast. So this really underpins everything. Deferred benefits kill adoption. And so if the setup takes an hour and then your first results are mediocre, you've confirmed everybody's priors. So what you want is something that out of the box works great and that the whole rest of the company can see. And the way that I did that is I led by doing it first. So I was out in front. I was using AI every day. I demonstrated how to use it successfully. And then people knew this wasn't a fad, that we were actually bought into it and it wasn't something I was going to roll back in some random decision a few weeks later.
Okay. So in sum, this is the operating system model. This is like an actual system that we built. There's five parts of it. Each part addresses the barriers we talked about. I want to walk through each of these in a little bit of detail. But again, this is every piece that we built. This is not like a theory. I open sourced most of the pieces of this and we iterated on it over time in production with a 150-person company doing real client work. So this is real.
So step one, boring and critical: we standardized on a tool set. This is just really basic enterprise tech stuff. But once we got everybody on Claude Code we can treat it like an enterprise tool. We have supported configurations. We have known good defaults. You have a clear path to 'this is how we do Claude Code here.' And if you skip this step, you can't do anything else. You end up with 40 different workflows and zero leverage because everyone's trying a different tool and there's no compounding benefits.
Then we wrote the AI handbook. So this wasn't necessarily just to teach people how to prompt, although it had some guidance like that, but it was to remove ambiguity. We want to have clear articulation of what tools are approved, what isn't, especially for sensitive data because that's what we live and breathe. So we talked about when a client, when a tool can't be used on client code, where meeting recorders are disallowed — we do a bunch of work under privilege with lawyers — and where we want people to use a specific alternative like maybe a local LLM. So this stops the endless Slack debates about 'is this okay? Can I use this thing instead? I read this tweet about this concern.' And it also gives people on our team a shared answer. So if we get questions from our clients about how we're using AI in their code, everyone knows the right answer. It's all in the document. It's all in the AI usage handbook.
So then I can push harder on adoption. This is where we built the AI maturity matrix. So we want to have AI as a first-class professional capability. This is like can you use Git? Do you know how to write tests? How do you use AI for your job? So this is a ladder with clear expectations, clear levels and a clear path up. And there are consequences for staying stuck. We have a level zero where people are actively fighting against the company's use of AI. And that is not a technical skill issue. That is like somebody who's not in line with the future of the company.
So this maturity matrix is how I think you avoid two failure modes. One is leadership wishing adoption into existence because now we have a path we can push people down, and the org splitting into the AI people and the not AI people. You want to collect everybody and push them through together. So this just makes it a normal part of growth, of performance, resource allocation and the rest of it.
Then we get to the fun part. We do hackathons as a management system. These are short focused sprints, usually two to three days with a single objective. That's how we keep pace with an ecosystem that changes every week because I can't show people — I can't write up a guide for how to use tools that got published three days ago. There's no documentation. There aren't YouTube videos. There's no course that you can sign up to. You just have to get hands-on keyboard time trying it out. So for all these things you'll see that we have a kickoff message like 'Claude Code Hackathon v2: Autonomous Agents' and the lines that matter are the objective. We want to ship impactful changes across our AI tool chain and public repositories. And the twist that engineers have to work on bypass permissions mode.
And that's important. We're trying to shove people down a single pathway where we have learning objectives for the hackathon. It's not just make whatever you want. It's the company needs you to learn this. All of you are going to get time to experiment with it and you can develop what you want to play with within those constraints. So it forces everybody to learn those constraints. It forced everybody to learn how to do sandboxing, how to write guardrails for your agents and how to structure work so the agents actually do the things that you want them to.
And then we also have to think about how we operationalize it. We can't do this on a lot of client code. You know, that's the wrong place to experiment. So we did it on our own code. We did it on public code that's on our GitHub and we measured success by how many reps you put in. We want to have activity metrics. I don't care if you ship like an amazing feature or whatever. I just want you to put in the hours so you get time to experience what it's like to use these tools hands-on. But we also had people work in pairs. We want to have some quality control. We want to have people talk to each other about their use of AI and share what they learned. So we had everybody work in pairs. That way we had some quality control built in.
So all this stuff, it creates a lot of motion, but we have to capture the motion and compound the knowledge somehow. And that's what we did with skills repositories. Now if anyone's been following Trail of Bits over the last few months you may have seen this but the key thing here is we turned what we learned into artifacts that scale. A skill is a reusable structured workflow, ideally with examples, constraints, and a way to verify the output for specific tasks that people work on. So we have three of them in fact. We have the external one that all of you can see. We have an internal one with company specific workflows and we have a curated one that's kind of like an enterprise app store for when people want to use skills that we didn't write.
This really helps us. I think the external validation matters a lot to a company like Trail of Bits. Keeps us honest. It validates that our skills actually work. It makes sure that they're adaptable. They're not so constrained by specifically how our company works. And it lets me know that hey, if a client tries to use it, they can actually benefit. So we like to share things that other people can use. It's a key part of who Trail of Bits is.
Obviously, and I kind of forecasted this — once you tell people to go use skills and plugins they start installing random stuff. So we created a curated marketplace that's like a known good place where you can get third party skills. Basic enterprise thinking applied to agent tooling. If you want adoption you need to have a safe supply chain.
Another key part of a safe supply chain: agents, when you ask them to write code, they pull in random dependencies from all over the place. So we constrained our supply chain using a couple of innovative techniques. We have all the standard risk mitigation settings turned on for all of our package managers. But the key security control that most people forget is a cool-down policy. We have ignore scripts turned on on npm. We have a bunch of things that tell you what mirrors to download code from. But the main thing is that these Trojan packages don't last for that long in public. And if you're committed to only using packages that were published at minimum seven days ago, you avoid the vast majority of malicious code that's been pushed out to npm or PyPI or crates and whatnot. So from an IT perspective, we went through and configured every single package manager on every developer machine to have a really strict cool-down policy. And that has saved our butts just as much as it has deploying EDR in a really strict mode. But it's much simpler and doesn't cost anything.
Okay, now let's go back. So then we had to make the default simple. So we built a repository that centralizes all the configurations that we learned from those hackathons. We don't want to make onboarding into the company tribal knowledge. When you join Trail of Bits, your onboarding into the company is you install this repository and read it to see how we do Claude Code at Trail of Bits and then you're one of the team. You don't have to discover it. We can't wait for three months or six months for people to mature their skills in using these agents in order for them to be useful for this new workflow that we've built.
So we put known good settings, recommended patterns for your personal CLAUDE.md or AGENTS.md and everything else we want to standardize so people don't reinvent it. And this collects scar tissue over time. You know, as the tools change, we change the guidance. As we learn things, we add more.
Sandboxing. So if you want autonomous agents, you need to sandbox them. So here we didn't pick a winner. We just gave people multiple safe lanes. We built some of them ourselves. We have a dev container option. We have a native Mac OS sandboxing option that's built into most of these tools. Codex and Claude Code both have them. And we also built something that lets us hold the agents at arm's length away inside of DigitalOcean called Dropkit. So the point isn't that everybody uses the same sandbox. The point is that everybody has a safe sandbox and that it's easy to adopt.
And finally, once you have this whole package, you can start thinking about how you connect agents to real tools inside your company. One example that we've published that's pretty simple to follow is that we have an MCP server for one of our security scanners called Slither. You don't have to really care what this scanner does. It's an Ethereum smart contract scanner. The thing is MCPs turn your internal tools into something that agents can reliably call and that your org can govern. So we can track how agents use the tool. We can add permissions, authentication, authorization and hook up new data sources as time goes on. So the Stripe Minions paper has a really great example of how they're doing this where they have one mega MCP server with like 400 different functions and all the authorization for different groups within the company to use specific ones or not use specific ones.
So the results — what do we have from this? These numbers are a little bit dated but across our internal and public skills repositories, Trail of Bits now uses 94 plugins comprised of 184 specialized agents and 400 reference files encoding our domain expertise on all of our engagements. That's the compounding effect. Every engagement, every auditor, every experiment adds to that arsenal.
We also have breadth of coverage. So we have skills for writing sales proposals, for onboarding new hires, for prepping conference blog posts, for delivering government contract reports. We have a wide variety of skills.
And then on the delivery side, for certain engagements where the code base and the scope allows it — it's not for every single client, but for some of them — we've gone from finding an average of 15 bugs per week to finding 200 bugs per week. And that's not really a faster human. You're not just taking a single person and pushing the gas. That's a change in the way that we're delivering work. We have a human with a specialized fleet of agents behind them doing targeted analysis across an entire codebase in parallel and validating those results.
According to the best metrics that I have, about 20% of all the bugs across the entire company are initially discovered by AI. Now on the business side we also have our sales team being prolifically efficient. The standard for professional services is that an individual salesperson can usually pull about two to four million in revenue per year and our guys and women are pushing about seven to eight million, which is just a phenomenal improvement in their ability to perform their jobs based on using AI to do scoping, to write SOWs, to negotiate legal agreements and do every other part of their life cycle. So AI doesn't just help engineers, it helps everybody across the company. And this is just a few months, maybe about a year into building this. Seriously, the models are getting better. The skills repo grows every month. And I think a year from now, the numbers will look very different.
Okay, so I want to pause here and flip to what I think this means for the industry. And I'm just going to riff a little bit. I threw these together in the last couple days because this is really an evolving narrative where I think I know where things are going, but I could be wrong. So the main thing here is that discovery is cheap now. It is easy to find bugs. It's easy to find more bugs than anybody knows what to do with.
So I put a couple of stats here. Mozilla shipped 271 security fixes in a single Firefox release when they reviewed their code with Claude and Mythos, and 180 of them were highly severe. So Mozilla built that themselves, that wasn't a vendor product that they bought. They just used the model as it came out of the box.
Two to 300 is the number of flaws that Daniel Stenberg found in curl. Curl is one of the most audited C code bases in the world. Daniel Stenberg is fanatical about his commitment to quality inside of the curl codebase and over the last eight to ten months he has found and fixed about 300 flaws using a combination of different tools. But there's actually a fun little caveat to this, which is that when he did finally run Mythos, it found five bugs, four of which he said were effectively false positives or dupes, and only one low severity bug came out of it.
So there is this layer of bugs that I think AI can now discover where they've made a huge quantity of bugs effectively transparent. All of your security debt is instantly discoverable with a button click. But there's a limit.
And then 100 plus — that's from this week. Pwn2Own has these competitions on a regular basis to solicit for full chains of exploits against popular products and they actually had to shut down their competition because they got too many submissions for the targets they picked. They got 100 plus submissions. They just closed the door on it. That's insane.
So our capacity to process attack capability hasn't kept up with our capacity to produce it. So this isn't just noise, right? These 271 fixes are CVEs. They are validated bugs. It's not just junk bugs. And the Pwn2Own submissions, they're also not junk bugs. They are full working exploit chains that meet the criteria for the contest. So most people in the security industry, we know exactly what's coming next and I think that is a lot about the patch side of things.
So got two quotes here. Halvarflake, one of the most respected vulnerability researchers in the field, his framing is that this is a liquidity crunch, not a solvency problem. So AI didn't like create new technical debt. Your debt existed the whole time. It just made all the debt visible and it made you pay it back all at once.
And then Katie Messuris has coined this term 'vulnerooma' and she's effectively calling that the disaster isn't the fact that we have a vulnerability tsunami, it's the patch tsunami that's going to follow. So most companies can't keep up with the volume of fixes, let alone the volume of findings.
So at Trail of Bits, we actually receive packages of hundreds of bugs a week from a frontier lab via one of their cyber models. They send us only what their internal pipeline classified as higher critical. So it's all pre-reviewed. And I'll tell you guys that less than half survive our triage process, even when they've been pre-reviewed by people at the Frontier Lab. We get about 10% that aren't reproducible at all, 10% that are false positives, 20% that have already been patched upstream, 15% that get downgraded from critical to medium.
And this really shows up in that little anecdote about curl where Mythos found five bugs in curl, all of which were high or critical, and then Daniel Stenberg eliminates four and downgrades the last one to low.
As we've been dealing with this, we built a mechanical checker. We built an automated false positive checker that does the first pass when they dump hundreds of bugs on our shoulders. It pulls HEAD, it reproduces the proof of concept, it checks for upstream patches, it dedupes. It costs us about five dollars a bug. And that's a cheap layer of judgment that we've productized. But that's not really where the time goes. That's not the hard part.
The hard part is really: is this bug exploitable in the client's deployment? How does this relate to the threat model of the code? Is there a chain with other findings that we can use to produce an exploit that's a real attack path? Is the severity rating something that the CISO wants to act on? You know, even when we take the latest and greatest models like Opus 4 and ask it to review some of these findings for us, we still get inflated severity. Like these models are really bad at exploitability testing. They cannot understand the actual utility of the bugs they find.
So that's like a weird gap because as the discovery volume goes up and we're keeping pace with mechanical verification via things like our false positive checker, the value of the human judgment that needs to happen in order to truly figure out the exploitability rises in value dramatically because that's still a human process and it doesn't scale with the number of bugs that we're finding. So most of these frontier labs in 2025, they were finding 100 bugs at a time. You have to imagine that they probably have 10,000 bugs they're sitting on right now and they have no ability to figure out how to report them, whether they're useful, whether these are truly exploitable, how they fit into an attack chain.
So that's really where I think we see Trail of Bits providing a ton of value in the near future. That's the Jevons paradox for this particular issue.
So now, how is this changing professional services? I think the thesis is what Ryan Daniels at Crosby Legal published back in March. He described that partnerships designed to lock in human labor are headed for extinction. And the new model is something he calls a neo-firm, which is an equal split between practitioners and engineers. It is no longer enough to assemble a bunch of smart people and set them out at a task unless they have specialized tools and research backing them up.
So professional services firms are going to need to innovate as a core practice and probably be 50/50 balanced with people that go out there that are forward-deployed, that deeply understand the domain that clients are working in, and have tools and specialized agents that back them up.
The deployment model looks a lot like what Aaron Levie has been describing. So the agents change the process themselves. And the issue here is like the number of people that are writing code now is dramatically magnified. Like there used to be 500 or maybe 5,000 firms that had a deep bench of engineers that were working on problems that needed security review, but now you have like 500,000. You have a lot more people writing code. A lot of it is highly domain specific and it needs somebody who can sit there and understand specifically how that problem is being solved in order to understand the security ramifications of it. So this concept of forward-deployed engineers has now come back and I do think that this is what's required in order to fully embed and bake in security knowledge that's required into all these new workflows.
You've also got the capital. So you can follow the money, right? And if you follow the money you see that OpenAI and Anthropic, they both have forward-deployed engineers but they have a handful, they might have 100, they might have 200. And what they've done is they've gone to market with collections of 20 or more firms they've put together into groups like the OpenAI deployment company and Anthropic's joint venture with BlackRock and a dozen other firms. Because they just need humans in order to run this process. Like we haven't reached AGI. And the reason why I know that is because there's a huge demand for forward-deployed engineers.
And then the nature of the work itself is changing. So agents aren't tools anymore. They're team members. We've got background agents. They're part of the org chart now. They live and collaborate with engineers as they deliver work. This is how we're hitting 200 bugs a week on the right engagements. So the line from the Crosby Legal guy that elite partners will be even more valuable because they're leveraging armies of agents, I think rings true here.
And then finally, the other part about the team is that the team has to show up with receipts. There's only a small number of companies on earth right now that have access to these trusted models, these cyber models that have access to Mythos or access to GPT-5 Cyber. I'm lucky that Trail of Bits is one of them. So these big frontier labs are now sort of picking winners where I'm able to do work and I'm able to find bugs and I'm able to instruct the model to do tasks that many other firms are not able to. It's not something that's easy to replicate.
A prediction that I made in an earlier version of this talk was that private inference was going to be a big deal. And I think this has come to pass. Everybody's concerned about privacy. And I think that everybody that is doing anything is working on their own version of private inference. You saw Meta launch this earlier this week. They have Incognito Chat. It uses AMD SEV-SNP confidential virtual machines in order to produce a system where they can't even read the prompts and the answers that you're sending to their servers. It's all end-to-end encrypted through a trusted execution environment.
These are extremely hard to build. They're extremely fragile because you think about what they're actually saying is that an adversary with Meta's own resources that have the servers sitting in their office can't break in to that system. That is just like a crazy statement to make and it requires everything going right. So we've been reviewing many of these for people out in industry. We worked with WhatsApp in order to pre-review this before launch and found a lot of issues because these are deeply sensitive systems that rely on systems-level issues that aren't superficially testable. So these things can be secure but you have to put the work in to make sure they are. So I really think it's only a matter of time before the vast majority of services we use are actually performing private inference.
So let me close with — I described the essential starter path, the minimum viable AI native work design. That's these listed out from one to six. That's what we've done so far. It's changed the way that we can ship and how quickly we can adapt. And if you want to copy this, it's not enough to copy one specific thing in the deck, you have to copy the whole system.
There's a bunch of open questions. I don't know how to solve these yet. I do think, and I'm planning for a future where AI is buying services from Trail of Bits. That has really thrown my marketing team for a loop quite a bit. We're trying to figure out as well like how do we isolate our deployments so that we're less exposed or not exposed to things like prompt injection when we use these high-powered agent tools on sensitive engagements. Right now we have crude solutions to this but we have a prototype deployment of, for instance, NoShell inside Trail of Bits where instead of the agent using bash, we have something that's highly containerized and controlled via system call restrictions that we think adds a lot of value to this problem.
And then policy enforcement and continuous learning at scale. If all of your employees are using these systems, we have ways to control them to make sure they're safe, like with those developer tooling enhancements we did for cool-down policies. But can we learn things from the way that our employees use AI every day and then synthesize those across the business at agent speed, regularly, every single day? And I think that's a big open question as well that I'm still working on.
So references — these are the tools that we have published that all of you can benefit from right now. Our list of skills grows every single day. I do big drops about once a month with 10 and 20 new skills at a time. There's some fun ones in there like Let Fate Decide if anybody likes tarot cards. They're not all security skills. But there are some really fantastic security skills in there, including some techniques that I don't think people realize are beneficial avenues of research and experimentation like our Dimensional Analysis skill, which I think you guys should check out if you have any interest in that. But my Twitter is up there. If anybody wants to get in touch, email me, tweet at me, and I'll be happy to answer questions. I think I have six minutes left, so thank you for having me.