2025 Year-In-Review

Ilya Sutskever, co-founder of OpenAI, made a striking observation in a recent podcast: "The models seem smarter than their economic impact would imply. This is one of the very confusing things about the models right now." It's a sentiment that perfectly captures the dissonance we've encountered repeatedly this year working with mid-market companies.

The capability of the latest AI is undeniable. Today's frontier models can write nuanced prose, generate working code and analyze complex topics. And yet, translating that capability into measurable business value remains stubbornly elusive. BCG's 2024 analysis of 1,000 C-suite executives found that only 4% of companies have developed AI capabilities that consistently generate substantial value. MIT's NANDA Initiative reported a similarly stark finding: 95% of generative AI pilots fail to deliver measurable returns.

Norwegian Cruise Line's impressive Nora AI, despite significant investment and a strong team, has yet to fully unseat the core booking experience. Marriott's innovation team managed to produce a successful AI copilot to augment their loyalty team, but it took over a year to implement due to data integration challenges. These are real, hard fought wins. But do they add up to five points of margin like many CEOs and boards expect? Not yet. And these outcomes came from companies with dedicated AI teams - a luxury most mid-market companies don't have.

Our entire team at Eskridge has been focused on nothing but this problem for over 2 years, across multiple clients. While we've had our own share of wins, there's clearly significant room for improvement. What follows are the key learnings we've accumulated in 2025 in our pursuit of unlocking AI value for mid-market companies.

Learning 1: Efficiency isn’t the Right Metric.

The most instructive case study of 2025 wasn't a success story - it was Klarna's dramatic reversal. In February 2024, CEO Sebastian Siemiatkowski announced their AI chatbot was "doing the equivalent work of 700 full-time agents," handling 2.3 million conversations monthly. The company announced a hiring freeze. It seemed like the definitive proof point for AI-driven efficiency. By May 2025, Klarna had reversed course entirely. Siemiatkowski told Bloomberg: "As cost unfortunately seems to have been a too predominant evaluation factor when organizing this, what you end up having is lower quality... really investing in the quality of human support is the way of the future for us." 

The lesson here extends well beyond customer service. We've seen this play out firsthand in 2025. We built an agentic AI tool for a client that reduced the time it took to do a particular task from an hour to under 10 minutes. Leadership was elated that they could bolster the bottom line. Yet when we dug further, the time savings was spread across the team and work week in such a way that headcount reduction wasn’t a realistic option. We were able to deliver real time savings, but hard cost savings proved to be elusive. It turns out individual task automation rarely translates to headcount. And ultimately, very few companies ever saved their way to success.

So efficiency isn't the right metric. Then what is? We're seeing more sophisticated companies measure AI success through capability expansion and improved customer outcomes rather than headcount reduction. For example, Gong’s product focuses on improving win rates, and their research suggests that sales reps using AI preparation tools increase their win rates by 26%.

The shift is from "how many people can we replace?" to "what can our people now accomplish that wasn't previously possible?" This includes deeper research to prepare for every prospective customer call, exploring creative concepts at higher fidelity before committing resources, and accelerating time-to-value for customers. These are harder to measure than headcount, but that’s where the real value lies.

Learning 2: Tech Changes Fast. People Change Slow.

One of the major blockers of AI value creation is internal adoption. BCG's analysis confirms what we've observed anecdotally: 70% of AI challenges stem from people and process-related issues. Only 20% trace to technology and just 10% to algorithms. Yet most companies invert their resource allocation, focusing on technology over transformation.

The resistance is real and documented. 45% of CEOs report that employees are resistant or openly hostile to AI adoption. This stems more than mere skepticism about AI’s efficacy - we’ve also observed existential concerns that AI might render certain careers obsolete. One recent blog post titled "I Don't Care How Well Your AI Works" captured this sentiment perfectly: the author wasn't questioning whether the technology functioned, but whether it belonged in their workflow at all.

Beyond adoption, simply getting buy-in from various stakeholders can be surprisingly time consuming, as the number of stakeholders that AI requires can be both far-reaching and non-obvious. For example, we built an AI-driven creative production workflow for one of our clients fairly quickly. Scaling usage required alignment not just within the client's production team, but with their sales team revising collateral, their legal team reviewing contract language and indemnification clauses, and ultimately their clients as well as their clients’ governance and legal teams approving AI usage.

This is such an obvious use case of AI, but getting buy-in from the above has taken 5x the time that building the thing did. Part of this is because every single one of those stakeholders effectively holds veto power. And AI is still new enough that it’s often easier for someone to say “no” than to say “yes”.

In response to this particular challenge, we focused the latter half of 2025 on expanding our change management capabilities. It’s quickly become a critical complement to the technical work we do when deploying AI.

Learning 3: Serve the Front Line.

We've learned the hard way not to over-index on what AI programs leadership wants. We built a collection of AI agents that sounded great on paper for a client, but the actual boots-on-the-ground individual contributors didn’t see a need for it and many chose not to use it. The irony is our team has spent significant periods of our collective careers on building user-centered digital experiences. In our excitement to understand AI, we failed to apply the time tested strategy of deeply understanding your users before building.

IBM Watson for Oncology is a public example of this type of oversight. After $4 billion in investment, the AI-powered clinical decision support system failed in large part because oncologists found the interface disruptive to their workflow. Limited engagement with end-users during development contributed to usability issues, and hospitals quietly abandoned the system.

Palantir famously pioneered the model of forward deployed engineer, where their consultants embedded directly with the teams doing the day-to-day work to ensure their solutions solved real problems. We’ve since worked with former Palantir colleagues to incorporate the most critical parts of their process into our own services. For example, we've begun new engagements by conducting research with internal users. Understanding their pain points is critical to getting buy-in and ultimately driving adoption. It takes some effort upfront, but dramatically reduces the risk of building something nobody wants to use.

Learning 4: Stay Focused.

Transforming large, legacy organizations all at once is extraordinarily difficult. Mid-market companies face a distinctive set of constraints and opportunities that make this challenge even more acute - you simply don’t have the resources for dedicated AI teams. This means you need to be smarter about where you place your bets. But, you can move with far greater speed than those companies, especially if you focus your efforts narrowly.

The good news is focus improves the efficacy of your AI programs, no matter your company’s size. BCG research shows successful organizations focus on depth over breadth, prioritizing an average of 3.5 use cases compared with 6.1 for less successful companies. Those companies that had the strategic insight and organizational discipline to focus their efforts on fewer, high priority use cases enjoyed on average 2.1x more ROI for their AI.

An approach that we've seen work is to set up a small autonomous team, give them space and permission to iterate, prove out a new model, and then push what works back into the broader organization. These work best when built as greenfield teams, as there are no preconceived notions for how the team should work held within the team and by senior stakeholders. They’re able to move quickly and more easily adopt new AI-powered workflows. If possible, staff the team with internal AI enthusiasts, as they’re the most knowledgeable and excited about the technology to begin with.

Learning 5: Keep It Simple.

Your team will be tempted to build elaborate AI systems with the latest tools and developer frameworks. Make sure they resist this. Anthropic's official guidance to developers is telling, and mirrors our own experience: "Consistently, the most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns." We couldn’t agree more with their recommendation to start with the basics, optimize relentlessly and only add complexly when simpler solutions fall short.

Over time, we’ve evolved the incremental approach we use with clients, and have made our MVPs that much more minimal. Many use cases can be addressed entirely with off-the-shelf AI tools that require no coding at all. We prefer to start with a well-crafted prompt in a consumer-facing tool like ChatGPT, Claude, or Gemini. Only when you need integration across multiple systems should you consider custom development. And even then, the lightest touch is usually the right one. 

This restraint pays dividends as the technology matures, as the foundational models continue to improve at a remarkable pace. Many techniques that seemed essential a year ago - chain-of-thought prompting, certain agentic workflows - have become obsolete as reasoning capabilities have been built directly into the base models. Claude 4 now features extended thinking with tool use. GPT-4.1 has made lengthy reasoning more affordable with an 83% cost reduction..

We're increasingly convinced that elaborate agentic workflows will be superseded by raw LLM capability. Ask what the simplest possible solution looks like. Ask whether it could be done with existing tools before writing custom code. The right approach often requires thoughtful configuration rather than engineering. Context management - ensuring the AI has access to the right organizational data - will likely become the true differentiating layer for most mid-market companies. This requires getting your data in order, but also being clear about what data you actually need to enable your priority use cases.

Sutskever's observation isn't a mystery once you've lived through enough implementations. The models are incredibly smart and capable. The economic impact is lagging because we've been measuring the wrong things, building for the wrong people, and over-engineering our solutions. The companies closing that gap aren't the ones with the greatest AI budgets - they're the ones that have learned to get out of their own way.

Next
Next

The Advent of Generative Personalization