TechLeader Voices #2: Lessons in RAG, Risk, and Real ROI: GenAI Blueprint for Enterprises

Denis Rothman, Maria Parysz, and Clint Bodungen break down why RAG, fine-tuning, and human oversight outperform scale. Plus: global trends, practical tools, and real-world AI deployment strategies.

Jun 26, 2025

Goldman Sachs just rolled out its generative AI assistant, a natural-language copilot, to all 46,000 employees across divisions, from developers to investment bankers and wealth managers, with specialized capabilities for each function.

Why don't most AI deployments succeed like Goldman’s? What changes?

My colleague, Manish, recently sat down with three enterprise AI experts: Maria Parysz (CEO, Elephant AI), Denis Rothman (bestselling AI author and Transformer/RAG authority), and Clint Bodungen (30-year cybersecurity veteran and author).

They told us there's a staggering pattern that's emerging across industries lately.

Teams burn months fine-tuning a massive LLM, but it still doesn't work (and cost) the way they would expect. That's when the hard truth surfaces: a smaller model plus RAG, fine-tuning, and human oversight could’ve done the job cleaner, faster, safer.

Their consensus for tech leaders?

Leaders ought to think in terms of system performance from a holistic perspective, factoring in:

Model
Processes
Data flows
Human reviewers
Guardrails

...rather than just model accuracy on a benchmark.

This issue dives deep into their thoughts on making GenAI actually work for enterprises.

We’ll also go over our take on what's making headlines in the AI world. Let’s get into it.

-Devaang Jain

Editor, TechLeader Voices

In Today’s Issue:

Expert insights: Maria Parysz, Denis Rothman, and Clint Bodungen prioritize system architecture and fit-for-purpose design (RAG + fine-tuning + human oversight) over model hype.
Around the World: 85% of execs cite poor data quality as top GenAI challenge, Gartner predicts 30% of GenAI projects to in 2025, Deloitte recommends targeting narrow use cases for faster ROI, and Forbes highlights 37% enterprises mix open-source and proprietary models to stay flexible.
AI Toolbox: Transform your research and documentation processes with summarization and audio overviews using NotebookLM, Google's AI-driven note-taking tool.
Exclusive Invite: We interviewed senior leaders who've led successful AI implementations and distilled strategic insights into our new ECHO Reports. Buy any report you like at half-price. Use the code TL50 at checkout.

Expert Insights

From Model Hype to Data Reality in Generative AI

We interviewed Maria Parysz (CEO, Elephant AI), Denis Rothman (bestselling AI author and Transformer/RAG authority), and Clint Bodungen (30-year cybersecurity veteran and author) to know their thoughts on separating enterprise Gen AI hype vs reality.

If you're an executive leader, you'll remember that aha moment when GenAI delivered something eerily useful, like when a model rewrote a customer email in seconds, summarized a dense strategy deck, or replaced an hour of developer effort with one clean line of code.

That moment is seductive. And it’s exactly what drives budgets fast and expectations even faster.

But when everything feels possible, it becomes dangerously easy to chase the wrong outcomes.

The Allure of the Ferrari

“People tend to just dump everything into LLMs now… too many people think it can do everything”

—Maria Parysz

Parysz has watched organizations pour millions into models that were never built for the problems they’re being asked to solve.

That's what happens when teams go all-in on massive LLMs, burning time and capital, only to discover that a leaner architecture — a smaller model with the right retrieval system, would have fared better.

The metaphor she uses is dead-on. Trying to use a Ferrari for a job that maybe a Jeep, or even a bicycle, should do.

Watch Parysz drive the point across in this interview excerpt:

Parysz takes this further by envisioning an “octopus” architecture for AI systems. In her view, the future lies in agents, not the sci-fi autonomous agents of lore, but pragmatic software agents that connect an LLM’s brain to various specialized limbs.

In such an architecture, a large generative model might sit at the center orchestrating tasks, but it delegates to smaller, expert systems for things like real-time data queries, recommendations, or vision tasks. This modular approach counters the weaknesses of an LLM-alone. Each component is purpose-built and transparent, making the whole less of a black box. Parysz believes this agentic orchestration is a natural evolution.

Denis Rothman adds that there’s a misconception of treating generative AI itself as the whole solution, whereas in reality “user input is just a tiny thing in [a larger] ecosystem” of an AI application.

Most failures don’t happen because the model underperforms. They happen because leaders treated the model as the system, when it’s just one piece of a much larger machine.

Clint Bodungen sees the same problem but from a security angle. “We tend to be more cautious than the average user, just due to the nature of our job as cybersecurity professionals,” he said.

He’s not exaggerating the risk.

“When you use the ChatGPT UI, you’re risking sending your code and data into the cloud”

—Clint Bodungen

Take a look at what Bodungen had to say in our interview:

And if that weren’t sobering enough, generative models have a well-known habit of hallucinating, i.e. confidently delivering wrong or fabricated information. “Can we trust it to give us accurate threat intel? Can we trust it to help our SOC analysts?” Bodungen asks, skeptical.

Parysz highlighted an “awful situation,” where an airline’s customer service chatbot hallucinated a false promise to customers. They took the company to court and won. As Clint and Maria say, you can never be too cautious with (hallucinating) AI.

The Biggest Models Won’t Curb Hallucinations.
RAG and Fine-tuning Will.

The natural instinct, when chasing innovation, is to reach for the biggest, most advanced tool available. But in enterprise AI, size is rarely the problem, and it’s almost never the solution.

Our panel agreed on a fundamental shift in mindset: stop fixating on the model, and start designing the system.

Rothman drove this point home with a simple mantra.

“RAG (Retrieval-Augmented Generation) is not a model, RAG is a process.”

—Denis Rothman

Check out Rothman's thoughts on choosing the right model for enterprise AI:

Bodungen and Rothman both champion retrieval-augmented generation (RAG) as a method to combat hallucinations. RAG works by injecting the model with verified information on demand to keep the AI’s answers tethered to reality. For example, instead of trusting a raw model to answer a compliance question from its general training (and risk a wrong answer), you can have it retrieve the answer from your curated policy documents first.

Fine-tuning is another complementary tactic: narrow the model’s brain by training it on domain-specific data. In essence, you’re teaching the AI the truth according to your context, so it’s less tempted to make something up.

Leaders must choose whatever model fits each part. In other words, success comes from the supporting architecture around the model: your data pipelines, retrieval systems, prompt routers, and feedback loops.

The large language model itself is just one piece of the machine. Parysz agreed, reflecting that most of her recent project work has been “creating the data pipeline and processing the data first,” before even touching an LLM. If that part isn’t designed right, no model will save the business.

That flips the GenAI conversation on its head. Do you really need a 175-billion-parameter behemoth for every task? Probably not. Instead of asking “What’s the most powerful model we can afford?” we should be asking “What’s the most effective architecture we can control?”

“Is it what you want or what you need? These are two different questions because, of course, I would like a Ferrari. To be honest. I would like the fastest...I want the best model for me.

However, remember that this will cost you, this will slow down your system. You will need to invest more in security and generally the implementation and retraining.”

—Maria Parysz

Often a well-tuned 13B model with a great knowledge retrieval setup will do the job of a 175B model at a fraction of the cost. Or maybe no LLM at all; Denis chuckled that sometimes “all you need is a laptop with scikit-learn… in 2 months, you’ll have what you need,” for certain narrow problems.

The verdict's clear: architect for outcomes, and you might find a far simpler AI (or non-AI) solution delivers the goods.

Bodungen outlines a clear framework for choosing the right AI enterprise model:

Data-centric AI First, Model Second

Begin by defining the problem tightly and then attempt to solve it stepwise. Don’t start by picking a shiny model; start instead by picking the right approach.

“Are there better, faster, more efficient ways to process the data and do things before you send it to the LLM? You have to realize what the LLM… is for (and only then use it in a targeted way)”
—Maria Parysz

In practice, this means an almost inverted approach to typical AI projects: get your data and workflow right first, then add generative AI where it adds specific value.

Let’s take customer support automation as an example. Don’t feed an LLM into the entire queue. Start by separating out the routine queries that a decision tree or knowledge base can handle. Then apply generative AI to the truly messy, human-language cases, summarizing long complaints, rephrasing complex policies, drafting context-aware replies.

That’s the “divide and design” approach that turns GenAI into a tool of precision. Remember: the goal is not to use AI for everything, it’s to use AI for the right things in the right way.

Parysz reminds us that data-centric AI means accepting you’ll retrain your models “over and over again” as new data arrives. You never relax because your data and environment never stop changing. It means you’re not beholden to any one model. You can start with a decent model and make it great by systematically improving the data and feedback around it.

Bodungen’s lesson from the trenches of cybersecurity is equally important: don’t skip the guardrails. Even after you’ve slimmed down your approach and chosen a model wisely, you must anticipate failure modes.

Even with strong architecture and careful model selection, we must exercise oversight. Sensitive data must be redacted or encrypted. Guardrails must be active. And most importantly, humans must stay in the loop.

Rothman points out that reinforcement learning from human feedback (RLHF) is a powerful mechanism to refine AI outputs over time. Each time a human corrects the AI or marks an output as unsatisfactory, that feedback can be looped back to train the model towards what “right” looks like. Over hundreds or thousands of such micro-improvements, the system gets appreciably better at aligning with user needs and values.

“There are ways to combat the hallucinations… by using fine-tuning, by using retrieval augmented generation, and using local models”
—Clint Bodungen

Each of these adds structure and truth to the system. Fine-tuning trains on verified data, RAG keeps responses grounded in actual content, and local models reduce exposure risk.

If you’re using a closed API model, redact or encrypt sensitive data before it ever touches the model.

“Ladder of Trust” Framework for GenAI Enterprise

One of the most practical outcomes of our discussion was a step-by-step mental model for rolling out generative AI in the enterprise. Consider it a ladder that takes you from sandbox to full production, with safety nets at each rung:

Step 1: Sandbox with Purpose – Identify one high-impact, low-risk use case and build a proof of concept in a controlled environment. Limit scope and data exposure.
Step 2: Data Prep & Guardrails – Before involving an LLM, get your data house in order. Mask or omit sensitive info, and set up access controls. Decide upfront what success and failure look like (e.g. acceptable accuracy, response time).
Step 3: Minimum Viable Model – Solve the core problem with the simplest approach. Use small models or even non-AI if possible. Remember: you’re testing the idea, not trying to win an AI contest.
Step 4: Augment & Enhance – If needed, enrich the system: add a vector database + RAG for knowledge, or an agent to handle multi-step tasks. Swap in a bigger model only if you must. (Remember: more parameters are the last resort, not the first.)
Step 5: Human-in-the-Loop – Integrate human review and feedback at key points. For every answer the AI gives, have a mechanism for a human to correct or approve it during the pilot. Feed those corrections back to improve the model or rules.
Step 6: Gradual Scale-Out – Roll out to more users or cases in phases. Monitor like a hawk. Have alerting for anomalies (spikes in error rates, user complaints). Keep an easy path to roll back or hand off to humans.
Step 7: Continuous Oversight – Even at scale, don’t “set and forget.” Establish ongoing evaluation – weekly quality reports, monthly model recalibrations, quarterly strategy reviews. Keep refining prompts, retraining with new data, and tightening security as needed.

This framework is essentially a ladder of trust. Each step earns you the right to move to the next by proving the AI system’s worth and integrity at a small scale. By the time you’re at full deployment, you’ve minimized surprises.

It’s the opposite of the big-bang approach where you drop a giant model into production and cross your fingers.

TL Question of the Week

Does your AI strategy pass the “fit-for-purpose” test?

Gather your team and ask, for each AI project in our pipeline:

Have we clearly defined why we need this solution?
What does success look like?
What is the fallback if it fails?
Are we starting small enough to learn cheaply?

Around the World

KPMG Survey – GenAI Investment vs. Reality: Generative AI adoption gains momentum, but 85% of leaders say data quality is the top challenge. Like our panelists said, big budgets alone won’t guarantee ROI if your data and foundations are shaky.
World Economic Forum – Poor Data Management: Gartner predicts 30% of enterprise GenAI projects will stall out in 2025 due to poor data and risk management. The era of AI experimentation without oversight is ending – design for sustainability or face project failure.
Kong Inc. Study — Open-Source on the Rise: A recent survey of IT leaders finds 51% believe open-source LLMs will outmatch proprietary models, and 37% are pursuing a hybrid approach. The future of enterprise AI will be flexible and multi-model, and not one-size-fits-all.
Deloitte – Focus for Quick Wins: Deloitte research suggests targeting a few high-impact use cases (and layering AI carefully) accelerates ROI in enterprise AI deployments. It echoes our panel: narrow focus + layered tech = faster value.
Forbes – CIOs Go Hybrid: Forbes reports that 37% of enterprises are adopting hybrid LLM strategies (combining proprietary and open-source models) to stay agile. The smartest orgs are hedging their bets, mixing tools to get the best of both worlds.

AI Toolbox

Google's NotebookLM offers an AI-driven solution for business leaders to streamline information synthesis. This tool can summarize documents, generate explanations, and provide answers based on uploaded content, including PDFs, Google Docs, websites, and Google Slides.

Notably, its "Audio Overviews" feature delivers podcast-style summaries, facilitating easier digestion of complex materials. Integrate NotebookLM into your workflow to enhance productivity and decision-making efficiency.

From Framework to Real-World Impact

One of TechLeader’s recent ECHO case studies perfectly illustrates the Gen AI principles we discussed in action. We interview key industry leaders who've led successful real-world AI implementations and distill their insights into our reports. As we highlight in our ECHO report “Generative AI for 80% Customer Support Automation,” a mid-size tech company achieved 80% automation of support tickets while improving customer satisfaction by 95%, and they did it through a layered AI system built for reliability and scale.

The key was a thoughtful architecture: they combined ChatGPT with task-specific tools and built layered reasoning and cross-referencing to reduce hallucinations and improve context. They selected models based on task, not hype (sometimes a fine-tuned smaller model handled simpler queries) and kept humans in the loop for complex cases.

The result was a robust system the company could trust in front of real customers, not just automation for its own sake. Check out the full ECHO report for a deep dive into the tech stack and deployment journey.

Do you have an AI implementation you’d like to see featured – either from your own work or from a peer? Or would you like to share your insights in an exclusive feature with us? Get in touch by replying to this issue directly, or by writing to team@techleader.ai.

Explore our ECHO Reports

Forward-worthy Insight

Remember, the best GenAI strategy isn’t the one with the biggest model; it’s the one with the sharpest use cases.

How can you adjust one project this quarter to be more focused, accountable, and insight-driven?

That’s where the ROI lives.

Until Next Time

We hope this issue helped you see through the lens of an industry insider.

Rate this issue here

Hit reply to let us know how you found it, and what you’d like to see more of in the future.

We'll be back next week with full signal, no noise. See you in the next issue.