Data Products That Matter (Florian Douetteau @Dataiku , Ep 77)

🎥 Jun 03, 2026 📺 It's About Data ⏱ 18m

We interview Florian Douetteau, CEO and cofounder of Dataiku, about why data is the fuel for AI and how automation can solve ...

Watch on YouTube

About Florian Douetteau

During appearances in April and June 2026, Florian Douetteau discussed Dataiku's focus on "semi-deterministic" agents for enterprise use cases such as procurement optimization, supply chain management, and marketing budget allocation. He described this category as requiring "some form of determinism" rather than open-ended exploration. Douetteau stated that Dataiku launched three new products, including an agent management tool designed to "observe and manage agent at scale," and said that many customers already run "dozens to hundreds of agents" and need to assess whether those agents are delivering business value. Douetteau argued that "creating stuff is no longer a limit" for enterprises and emphasized the importance of governance, saying "if you give power to more people, you also need to raise the stake in terms of control." He asserted that "the risk for many organization is to be left behind" if they do not transform core workflows with AI. Regarding CIOs, Douetteau described them as "stuck in between board that want AI to happen but at the same time are worried about AI risk," and said that this represents "a make or break moment" where CIOs can either lead AI transformation or be replaced by "AI first CIOs."

Source: AI-verified profile updated from Florian Douetteau's recent appearances. Browse all interviews →

Transcript (23 segments)

✨ AI-enhanced transcript with speaker attribution

Host0:00

So the issue of creating value with data or agentic and so forth starts with this. How can you actually create data products that are correct?

Welcome to It's About Data episode 77. This is a conversation with Florian Douetteau, CEO of Dataiku. Well, hello everybody. We are actually at a really special occasion today. We're in San Francisco for Snowflake Summit 2026. But technically we're actually offsite, about a block and a half away at Dataiku HQ. Dataiku of course was named once again Snowflake Partner of the Year this year. Congratulations. And we're joined by our special guest, Dataiku CEO Florian Douetteau. And I hope I pronounced it correctly all this time.

Florian Douetteau0:56

You did. And thanks for having me, and welcome to the nest.

Host1:03

So Matt, you said that you have a stumper to start out with because you're the data scientist.

I wouldn't call it a stumper. Actually, it's from the press release for the Product Partner of the Year 2026 Snowflake. It talks about how Dataiku is very good at helping companies move from AI pilots to governed production scale deployments. And frankly, governed and production scale are two things that we talk a lot about on this show, but they're hard, right? So give us, you know, what's Dataiku's perspective on this? How do you make AI go into production scale and governed? What are the steps to make that happen?

Florian Douetteau1:40

I think in the enterprise market as we are today, we have to acknowledge that creating stuff is no longer a limit. That's my mindset at least. You can in theory code everything into existence over the weekends, you can probably code anything you want and have lots of fun. But imagine you're back on Monday in your company. How does it work in practice? And that's where governance kicks in. Meaning any data you produce needs to be validated. You need to be able to explain it and so forth. And so the issue of creating value with data or agentic and so forth starts with this: how can you actually create data products that are correct? Starting point. And correct meaning that they are tested, you've got multiple people in the organization that actually understand it, because if it's only one person it doesn't work. And how do you put some form of governance and testing around it, repeatability and so forth? What we are building with Dataiku on top of Snowflake, and notably one product we launched called Co-Built for Snowflake, provides this ability to create and vibe code things into existence, but vibe code data pipelines or agentic pipelines that are visual, tested, and governed, meaning they are within a framework where it's easy for people in the business to understand what's happening. Because the issue we are already having, all of us, with agentic technology is not about imagining things and turning something we imagine into something that looks real. It's about testing whether the thing is actually the thing we expect it to be. So that's what I focus on, and I think this need of control and governance is now what's blocking people from creating value with data.

Host3:31

Right. Let's step back here and go to some building blocks we have to work with. And you know, we're a little biased here. Our podcast is It's About Data, so I'm going to start with data. You take a look at the landscape in most organizations today, data is scattered, inconsistently governed if it's governed at all. But now obviously with AI, it really raises the stakes. So the question I want to ask you, Florian, is at this point, what does it take to make the data AI-ready?

Florian Douetteau4:07

I won't go into let's all build semantic models and ontologies and data governance, because you've heard that before and it's not new. So I won't repeat that, even if I believe it. But my particular take from a couple of weeks is that even if we do all of that, in practice the data that matters, the data products that are hidden, those shadow data products, are for instance in Excel. Do you remember Excel? Those Excels with 30 tabs and maybe even some VBA in the middle? Those shadow data products are the ones used by people in the business. And the next step of data readiness for agentic is about migrating those shadow data products into governed areas. Move them into Snowflake or whatever. And it doesn't mean just move the Excel spreadsheet into Snowflake as a table, that would be too easy. What you really need to do is translate the business logic that is within an Excel into pipelines so that you can repeat them and turn them into a process. And what we've seen more and more among your customers using our platform is indeed using our platform to do this type of automated translation of Excel, for instance. Because that's the issue of data readiness: you still have too many things, especially the things that are available, stuck on your laptop, scattered. It's not just scattered across various data marts and data warehouses. To some extent that part is manageable. The next layer is everything that is at the bottom, on the laptops, and move that up. And not move that up as just copy the thing because it has no value. It's move the actual business logic. And that's I think the next step for data readiness, which is probably not understood enough in terms of how important it is in order to actually create value with agentic down the line.

Host5:59

That's huge. So let me... This really resonates, this Excel conversation. This is exactly what I've seen over and over again doing data engineering consulting projects. In fact, quite often if you talk to the company's CFO, they're building out the quarterly report for the SEC and Wall Street in an Excel spreadsheet with 30 tabs, just like you say, with VBA. How can agents contribute to this process, but in a way that's trustworthy, production-worthy, and governed properly?

Florian Douetteau6:30

Now you can actually in our platform translate this type of Excel spreadsheet into an actual pipeline, meaning something deterministic. And this translation can be largely automated. We've automated this translation. It's no longer a big data engineering project to rebuild the business logic of the CFO into a few weeks consulting gig. That's something you can do in a few minutes to a few hours depending on the complexity, including all of the application on top so that the CFO can actually look at the data and create a summary of these hypotheses with a summary of the reports automatically. So those things now take minutes. And that's what's different. You can actually translate any type of Excel into a pipeline with its business logic. And indeed before, we all heard about those projects where you had this complex financial modeling that needed to be translated and data engineering, let's build the data model, da-da-da, and a few weeks later it's not really happening, it's complex and doesn't reconcile, and so forth. And then you just dump the data engineering projects to the bin, which is why everything stayed in Excel. Now we've seen it and people are doing it with our product, it's no longer the case. And so you could actually go beyond that now. And it's important because having this reconciliation as a data pipeline is the first step for the CFO to then be able to add some agentic process on top that will inspect automatically some things and so forth. Because if you don't do that first, you can't really apply the agentic layer automatically and reliably.

Host8:23

Right. And it's a very interesting point that agents can really help us do this work of taming data, of governing data. But let's look at it from the other side. From your standpoint and your work with your customers, the whole industry is talking about agents right now. Agents are the future. Where are customers at with regard to... obviously within your solution you have agents that can perform a lot of these functions. But for customers who are trying to work with agents, where are they at right now? Where's their comfort level? Where's their understanding?

Florian Douetteau8:59

I think it's still baby steps and early. Meaning you've got customers having from a couple of agents to some having thousands of agents. But if you go one step further in analyzing what's happening, when it's a few thousand, typically a lot of those are more glorified assistants, as in glorified RAG, as opposed to true agents. And I think we are at the beginning of it, but we start seeing customers getting more and more sophisticated in terms of their agentic use cases. What I find interesting as a subset of agents, and that we focus on in particular at Dataiku as I think it's interesting and worth building, is the category of agents that are semi-deterministic. Kind of unclassified today in all those agentic categories. But in particular it's things such as procurement optimization or supply chain or building your marketing budget. This type of thing where you need some form of determinism. It's not like an open clue doing project management that will run over the weekend hoping that you will get smarter and smarter every weekend so that you can end up planning your vacation. We're not talking about this type of agent. We're talking about an actual thing that can optimize your procurement, for which lots of practical rules need to be in place. And so it actually looks more like something that is agentic at some edges and some steps, and we can adjust the data and be smart on some steps, but also have lots of deterministic steps because you want to be sure about what it's doing, and maybe even at the middle it's a pretty good old decision tree that is actually taking the real decision you want to take. So this category of agent is super interesting because I think it's very important in particular for back office tasks, which is where a lot of the leverage is for agentic today.

Host10:49

It's interesting when we talk about determinism and agents that it's not necessarily throwing out all the old with the new. There's some valuable things we have from rules and from determinism.

Florian Douetteau11:02

Yeah, because it's so editable, and also from a latency and cost perspective it's a very efficient way to do it. It doesn't mean that you don't use agents in order to create all the agents that are deterministic in the middle. So it's a bit meta, but you are agentic in the way you build, and then the agents themselves are agents and combination of agents where you combine rules and agents. How many times have I said agents in the last minute?

Host11:29

We're giving the award at the end of this month because this is Snowflake Summit month. Who mentions agents the most?

Florian Douetteau11:39

I'm very grateful that agent is only five letters because otherwise...

Host11:45

At least they ask you a question, don't use the word agent. Okay, I'll try. I'm going to ask a kind of higher level question here. Dataiku really emphasizes human in the loop. That's what I've gotten from reading your materials. Your website talks about people, orchestration, governance, people being first. You've talked about agents as an enabler for CFOs as opposed to replacing a CFO. How do we train people to use AI technologies like the agents, the replacement for Excel, but also fight complacency, which has been in the news a lot, where people start trusting things they shouldn't trust, or maybe the CFO doesn't fully analyze the report and make sure they understand the results before submitting them for the Q4 report? How do we find that balance? How do we train people to not trust AI too much while making it a useful tool?

Florian Douetteau12:36

Yeah, I think one line of thought could be that you can't train them. They need to make their own mistakes first and foremost. And then look a bit stupid and be like, oh, maybe I should think twice before just vibe coding or pushing all of my Excel to Claude asking for the answer and hoping that the answer is correct. How else do we learn? Mistakes are a way for part of the training. I think what's missing in the enterprise is that when you think about it, the way you are building applications now is very different. It used to be data services, UI, whatever, this type of stack. Now it's more like you've got lots of tokens, they're expensive, and it's more like you've got a harness, some protective layers, in which you are building your agents, and they just run whatever they do. Indeed, for certain types of use cases or users, you need a certain type of harness to make sure they don't make crazy mistakes. If you are building your marketing message, maybe you can just use Claude connected to SharePoint. That's fine. Ultimately, it's very hard to create something that will ruin your company unless you are very stupid when using Claude building your marketing messages. But if you're trying to actually reconcile your financial data, that might not be optimal or it's kind of risky, especially if it's the actual report at the end. You probably want another system in place. So for this type of setup, you need another type of harness to build the agents. You need a harness where the agents will be built in a certain way, where the reconciliation would be a natural data pipeline that you can see step by step. I take this data and so on. So the thing we have to all learn as practitioners of agentic and creators of agents is that certain types of users for certain use cases will need a certain type of protection layers, a certain type of harness in order to host the agents they're building. So Dataiku, the way we are positioning ourselves in a sophisticated manner in terms of explaining what we do, is indeed harness for people, especially business users that need agents that have a lot of decision-making stakes, so that they can build reliable agents. If you're a data engineer or a data scientist, maybe you can write code everything into existence and be very good about it. But if you want everyone in the business to be able to leverage the value of agentic, you can't expect everyone to be able to read Python and understand what they're doing when they write code with code. So that's where you need a different paradigm in order to derive value from agentic.

Host15:41

And I think for you, for Dataiku, this challenge is especially pertinent because you've actually distinguished yourself by appealing not just to data scientists but also bringing business analysts or power developers into machine learning. I'll ask you a final question: what's your push to bring together this harness to keep people properly focused and properly aligned?

Florian Douetteau16:14

Yeah, ultimately the formula for success in AI is always this trade-off between people and governance. If you give power to more people, you also need to raise the stake in terms of control and governance. Because if you just give lots of orchestration powers to lots of people, you're just creating chaos. So the trade-off is you give power to people in terms of building lots of data pipelines by themselves, but in a way that is centralized and controlled, where IT can come after the fact and say 'You shouldn't be using this column to do this type of projects. What have you been doing? You're stupid. You will get fired. Please stop doing that.' And now we automate this type of controls. That's the type of thing required in the enterprise to actually leverage the value of data. Because if you don't do that, the other way to implement it for IT is to say 'Nobody will have access to data, you have to ask us to get anything done,' which is unfortunately what lots of organizations are still doing. That's why they don't derive value from data, because everything is blocked at the starting point. The CEO says 'You should do more data,' and the practitioners say 'We are just waiting for access, so we are not actually doing anything.' Which is fine, because now no one is losing their job because we don't derive value from AI.

Host17:58

Well, I guess our goal here is to start a conversation we can never finish, but we hope to have you back where we can follow this up. We've hardly scratched the surface. Florian, thanks very much for your time.

Florian Douetteau18:11

Thanks to you for having me.

Host18:12

Thank you.