AI Strategy Reality Check: Are You Prioritizing Platform Over Performance?

At UpLevel Ops, we take a vendor-agnostic approach, adapting to the tools our clients already use rather than steering them into rigid ecosystems. This flexibility allows us to develop scalable, effective AI solutions across various platforms, including Google Workspace and Microsoft 365, with a focus on what works best in real-world workflows.

Six months ago, I built a simple planner integration using Microsoft’s CoPilot bot-building features. The experience was rough, but functional. That early proof of concept suggested the platform might evolve into something more powerful over time.

Recently, our engineering team revisited CoPilot to test that assumption. We wanted to see if it could match the performance of custom GPT-based tools we’ve deployed successfully in legal and enterprise contexts. But what we found was not an improved or matured experience. It was, in many ways, a regression.

This raised a bigger question: Is the broader enterprise push to standardize on tools like CoPilot, often for the sake of streamlining IT and licensing, actually stifling innovation and trust in AI? Could this default-first strategy be the very thing driving user disappointment and slowing meaningful adoption? In this piece, we examine what happens when platform loyalty takes precedence over performance, and why that trade-off may be backfiring.

Revisiting CoPilot: Then vs. Now
Our goal in revisiting Microsoft’s CoPilot platform was simple: evaluate whether it could now support the kind of real-world use cases our clients rely on, starting with basic automation, document handling, and knowledge retrieval. These weren’t edge-case demands. We focused on foundational capabilities, such as summarizing a document, triggering a workflow, or retrieving structured information. Even modest functionality would have sufficed.

To clarify, our testing was conducted under a CoPilot developer license, not within a fully provisioned enterprise environment. While this limited our ability to test specific tenant-wide capabilities, it also surfaced some critical concerns that may be even more relevant in an enterprise context.

We began with the new CoPilot agent builder, expecting to see progress from earlier iterations. Instead, we found a stripped-down, underpowered interface. Agents lacked action-taking abilities, couldn’t integrate with systems, and offered no support for file uploads. Even something as basic as accessing and summarizing a public link failed, despite that being standard fare in generative AI.

Most alarming, however, was what happened next. During one test, the bot unexpectedly resurfaced task data from a months-old project: outdated planner entries and references to previous client tabs. How that information persisted across builds and sessions remains unclear. But whether it was due to poor data isolation or residual memory, the result was unsettling. For a tool so tightly integrated into Microsoft’s ecosystem, this kind of unpredictable data exposure poses significant risks, particularly in environments that handle sensitive or regulated information.

Still hoping the new interface was just immature, we tried the legacy CoPilot builder for comparison. It offered more backend control but was clunky and equally unreliable. We uploaded a document and ran a summarization task. The bot claimed the file didn’t exist. We repeated the query with the exact filename: It still failed.

This wasn’t a matter of minor bugs or missing features. It was a consistent inability to perform basic tasks. When compared with the speed, accuracy, and reliability of our custom GPTs and GEMs, deployed successfully across legal, operations, and enterprise workflows, CoPilot didn’t just fall short. It lacked the baseline readiness to be considered a viable AI tool for real work.

Critical Gaps in Core Functionality
Our evaluation of Microsoft CoPilot, which utilized its developer tools to replicate functionality common to our custom GPT agents, revealed several foundational deficiencies. These limitations weren’t tied to bleeding-edge expectations. We were testing the kind of bread-and-butter capabilities any enterprise AI tool should be able to handle reliably. Instead, what we found were architectural constraints so limiting, they raise serious questions about whether CoPilot can meaningfully support enterprise workflows.

This isn’t just a matter of missing features; it’s a signal that the tool may not be engineered with real enterprise demands in mind. To help illustrate the scope of the challenge, we’ve grouped our observations into three core categories:

Limitations in the New CoPilot Agent Interface:

  • No access to tools or action-taking capabilities. Agents are completely passive; they can’t trigger actions, launch workflows, or integrate with existing business systems.
  • No support for file uploads. Content must be linked via public URLs, making it impossible to use any private, proprietary, or internal documents.
  • Strict and unreliable link handling. Many valid public URLs were rejected due to query strings or length. Even accepted links frequently failed to load.
  • Inability to summarize or parse documents. Even when links were embedded directly into the chat, the agent often claimed it couldn’t access them.
  • Unprompted resurfacing of old data. Most concerning, the agent began referencing planner tasks and client tab names from months-old test sessions, indicating potential data persistence issues that violate expectations around sandboxing and information boundaries.

Additional Issues in the Legacy CoPilot Builder:

  • Cumbersome configuration. The setup process for legacy agents is outdated and requires manual effort with little guidance or built-in automation.
  • File upload failures. Even when documents appeared to upload successfully, the agent was unable to find them by name or access them at all.
  • Non-functional auto-build tools. The platform’s “auto-build” option failed to generate usable bots, even for simple summarization use cases.

Limitations Shared by Both Interfaces:

  • Lack of contextual memory. Bots failed to maintain coherence across a thread, routinely “forgetting” previous instructions or user inputs.
  • Failures on basic tasks. Document summarization via either file upload (legacy) or link (new) was unreliable and frequently non-functional.

Collectively, these issues extend far beyond first-release quirks. They point to deeper systemic design problems that undermine enterprise readiness. The platform lacks consistency, interpretability, and clear data boundaries that legal, operations, and security teams require. Worse, because CoPilot is marketed as a deeply integrated part of the Microsoft environment, its failure modes may carry outsized risks, especially when tied to sensitive business information.

When compared to more open, modular models like custom GPTs or Google’s GEMs, both of which we’ve deployed with success across legal and operational workflows, CoPilot doesn’t just trail. It fails to meet the basic threshold for enterprise-grade AI reliability. Until these structural issues are resolved, relying on CoPilot as the foundation for enterprise AI efforts could do more harm than good.

Lessons in Trust and Strategy
What is becoming increasingly clear in the enterprise AI space is that performance failures not only slow adoption but also undercut trust, credibility, and long-term momentum. When IT teams default to platforms like Microsoft CoPilot simply for convenience or ecosystem alignment, but those tools underdeliver, the consequences ripple far beyond user frustration.

According to a recent Accenture industry report, 28% of C-suite leaders cite limitations with data or technology infrastructure as the biggest hurdle to implementing and scaling generative AI. However, many of these limitations appear to be self-inflicted, stemming not from a lack of available technology but from rigid platform choices and access restrictions. In fact, 68% of employees report their employers don’t provide full, unrestricted access to AI-based tools, despite high demand and reported use.

This environment, one in which the “official” tools aren’t capable and the capable tools aren’t officially supported, creates a disconnect that slows progress and seeds doubt. AI becomes a compliance liability instead of an innovation driver.

We’ve seen the contrast firsthand. At UpLevel, clients who have adopted flexible, well-matched tools, such as custom GPTs or GEMs, and implemented them thoughtfully have consistently seen acceleration, not hesitation. Adoption rises. Teams engage. Use cases grow organically. Importantly, enthusiasm within pilot programs tends to increase over time, as users see what’s possible and build on early successes. We’ve also observed that adjacent teams and even external partners frequently ask to be included, not because of top-down mandates, but because they see the tools are effective. Success invites participation.

By contrast, when teams are forced into using underperforming tools like CoPilot for the sake of standardization, the strategy often backfires. AI isn’t judged in a vacuum; it’s judged by results. If the first experience with enterprise AI is frustrating or unreliable, it becomes that much harder to re-earn confidence later.

Trust remains the cornerstone of AI success. It’s earned through transparency, consistent performance, and responsive implementation. IT leaders hoping to scale generative AI effectively must be willing to ask a hard question: Are our tool choices building trust, or quietly eroding it?

The Continued Value of Custom GPTs and GEMs
If there’s a silver lining to the limitations we encountered with CoPilot, it’s the reaffirmation of what’s already working: pairing Microsoft’s strong automation backbone with intelligent, adaptable AI models such as OpenAI’s custom GPTs and Google’s GEMs. This hybrid approach continues to outperform more rigid, closed-loop tools, delivering reliable, scalable results in real-world workflows.

Unlike CoPilot agents, which remain locked down and hard to extend, GPTs and GEMs offer the flexibility that today’s enterprises actually need. These models can be tailored to specific roles, fed curated document sets, grounded in private knowledge bases, and updated quickly as the business evolves. And they don’t just respond, they adapt. This makes them far better suited for environments where nuance, accuracy, and transparency matter.

In our deployments, this architecture has consistently proven its value. Through direct integration with Microsoft Power Automate, we’ve built chatbots that can schedule meetings, check email, assist with project management, and support task workflows, all tailored to the specific needs of legal and operational teams. These kinds of use cases require orchestration through Power Automate. And ironically, it’s significantly easier to achieve these integrations using external LLMs, such as GPTs or GEMs, than it is with Microsoft’s own CoPilot agents. The experience is more configurable, less brittle, and far more responsive to real-world demands.

These external agents operate within clearly defined data boundaries, respect privacy constraints, and deliver dependable performance without requiring constant oversight or elaborate workarounds. They just work.

Microsoft’s workflow tools still play a crucial role. They offer a solid foundation for integration and orchestration. But they’re only part of the solution. The intelligence layer on top must be smarter, more adaptable, and more secure than what CoPilot currently provides. That’s where GPTs and GEMs continue to shine and why we continue to rely on them.

Ultimately, this isn’t about chasing features. It’s about choosing tools that align with how people actually work, tools that scale as trust builds, and tools that invite adoption rather than resist it. Until CoPilot can meet that bar, it will remain a cautionary tale: a reminder that in AI, outcomes, not vendor alignment, should drive strategy.

Final Thoughts: A Call to IT Leaders

Standardizing around a single vendor can seem like the obvious choice. It offers the illusion of simplicity: fewer systems to manage, consistent interfaces, and centralized security. Microsoft’s narrative around CoPilot taps directly into that appeal. But alignment only makes sense if the platform actually delivers. In its current form, CoPilot often creates more problems than it solves. Relying on it exclusively means accepting trade-offs that slowly erode trust and stall momentum. Users lose patience. Leadership starts asking harder questions. The promise of AI feels more like a marketing story than a working solution.

Think about it this way: When was the last time a business adopted a breakthrough technology and intentionally chose the slower, clunkier version, even when better tools were available? That’s what’s happening in a lot of AI rollouts right now. And the cost isn’t just measured in dollars. It shows up in missed opportunities, lowered confidence, and slower progress across the board.

At UpLevel, we’re not walking away from CoPilot. Many of our clients are being guided toward it by internal IT policies and licensing decisions. So we’re continuing to invest time in understanding what it can and can’t do. We’re taking a closer look at the Microsoft enterprise version to see whether a more fully provisioned environment delivers a better experience. We’re also testing whether the use cases we’ve already built with custom GPTs and other tools can be recreated inside that ecosystem. And yes, we’ll keep checking back to see if Microsoft resolves the issues currently limiting the platform.

Our goal isn’t to replace CoPilot outright. It’s to build intelligently around its shortcomings and fill in the gaps with smarter, more adaptable tools. CoPilot may be part of the stack, but it can’t be the whole strategy. That’s why we’re asking IT and legal leaders to make room for more capable solutions: chat agents that aren’t locked into a single ecosystem and meet the demands of real work. Flexibility isn’t a luxury in this space. It’s a requirement. Until CoPilot evolves, organizations need to allow for integration with more advanced, configurable chat agents that deliver better performance, a more seamless experience, and faster results. Because at the end of the day, our clients don’t just need AI. They need AI that actually works.


Brandi Pack, Director of Innovation at UpLevel Ops, has a diverse background that spans the legal, hospitality, education, and technology industries. Over the course of her career, she has excelled in various strategic business operations roles at Hewlett Packard Company, Constellation Brands, and Goodwill Industries. Brandi has a successful track record in project management, training, business development, legal operations, and IT services. She is a thought leader in the emerging space of AI in the workplace, particularly as it impacts the legal landscape.

The post AI Strategy Reality Check: Are You Prioritizing Platform Over Performance? appeared first on Above the Law.