Blog Post5 MINUTES

Why Enterprises Can’t Take General-Purpose AI Tools at Face Value

PUBLISHEDMarch 3rd, 2026

Generative AI is entering its next phase in the enterprise. What began as experimentation and productivity gains is now becoming embedded into core workflows, customer interactions, and decision-making.

The moment AI-generated insights start driving decisions, context becomes mission-critical, and leaders increasingly recognize that success requires more than technical capability alone.

This shift reframes the conversation: it is no longer about whether general-purpose AI tools can deliver value, but how that value is delivered across the enterprise both responsibly and at scale.

The trust problem: AI getting it wrong

Enterprise expectations of AI are rising faster than the technology’s ability to consistently meet them. While generative models excel at synthesizing information, independent studies show that general-purpose AI tools can still produce flawed responses.

For example, a BBC analysis found that 45% of AI news queries produced erroneous results across popular assistants like ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity. The data highlighted that Gemini actually performed worst, producing unsound answers in 76% of responses, largely due to poor sourcing.

So, why does this happen? Large language models operate probabilistically. They predict the most statistically likely answer to a query based on patterns learned from massive datasets, without verifying facts. Since most questions require nuance and contextual understanding, AI can confidently present plausible but incorrect answers or hallucinate entirely.

For consumers and enterprises at large, the risks are tangible:

Misleading AI outputs can damage brand reputation.

Critical workflows may be compromised in customer support, legal, or healthcare settings.
Compliance breaches can occur if AI misinterprets regulatory or privacy-sensitive information.

A recent warning: Google’s AI health summaries

The implications of these limitations are not theoretical. Earlier this month, Google removed some AI-generated health overviews after a Guardian investigation revealed users were being put at risk by false and misleading information.

The AI outputs had apparently ignored essential factors such as:

Age and sex
Pre-existing health conditions
Contextual relevance of sources

In some cases, the AI cited reference numbers without checking whether the advice was medically appropriate, meaning the very content meant to help users could cause harm.

While this was a consumer-facing implementation, it serves as a useful illustration of how general-purpose AI systems can fall short in high-risk domains without domain-specific controls. For enterprise leaders, the lesson is to recognize that use-case sensitivity, validation, and governance are essential when deploying AI at scale.

The human factor

The Google health summaries example highlights that even with increasingly capable AI systems, outcomes are ultimately shaped by how people interpret and act on AI-generated outputs. The trust challenge, therefore, is not purely technical but also human.

Gartner predicts that through 2026, atrophy of critical-thinking skills caused by increased Gen AI use will lead 50% of global organizations to introduce “AI-free” skills assessments. This raises a fundamental question for enterprise leaders: as AI becomes more embedded in daily workflows, are employees becoming overly reliant on AI-generated answers?

When systems present confident, well-structured responses, the risk is that users stop questioning outputs – even when those outputs are incomplete, misleading, or wrong. It’s extremely important to not overlook that human judgment remains a critical control layer in any enterprise AI deployment, particularly in complex or regulated environments.

Ensuring AI remains a trusted tool

In AI generated responses, when trust isn’t a given, there are several strategies leaders can take to mitigate AI risk:

Governance frameworks – define clear ownership for AI initiatives and establish decision-making authority to ensure accountability across the organization.
Model validation and monitoring – implement continuous testing, error tracking, and automated quality checks.
Explainability tools – leverage technology that surfaces reasoning paths, sources, and confidence levels.
Maintain human oversight – ensure that critical decisions involve expert review, and not just solely rely on blind trust.
Training & literacy programs – equip employees to interpret, verify, and contextualize AI outputs effectively.

Beyond technical controls, organizations should embed human judgment into everyday workflows. This involves creating processes that encourage employees to question outputs, validate information, and take ownership of decisions. By integrating AI as a supportive tool rather than a replacement, enterprises can make decisions that are both reliable and accountable.

When these elements are in place, AI becomes not just powerful, but dependable. For CIOs and IT leaders, the path forward is clear: implement strategies that combine transparency, oversight, and human expertise to ensure AI outcomes meet enterprise grade expectations.

Request a Nexthink Demo

Request a demo