With generative AI we’re making leaps and bounds in creating machines that can mimic human behavior. However, like a slick-talking used car salesman, these AI models often have their quirks and hallucinations that make us question their credibility. Let’s delve into the technical reasons behind these peculiar traits and explore emerging solutions like Microsoft’s Autogen multi-agent model.
1. The Gift of Gab: Verbose but Not Always Accurate
Much like a used car salesman who can talk circles around you, generative AI models can produce impressive, verbose content. However, the quality of this information can be questionable. Take for example the 3-Towel problem below:
“If it takes 3 towels 3 hours to dry on a clothesline how long will it take 9 towels to dry?”
Now if you’ve ever used a clothesline the answer here should be self-evident and succinct: It doesn’t matter if you have 3 or 9 towels, they will all dry at the same time as long as there’s room on the line.
Let’s see what ChatGPT said.
Drying time for towels on a clothesline can depend on various factors like weather, humidity, and wind. However, if we assume similar conditions for all towels, you can use a simple proportion to estimate the time it would take for 9 towels to dry:
3 towels = 3 hours 9 towels = x hours (what we want to find)
You can set up the proportion like this:
(3 towels / 3 hours) = (9 towels / x hours)
Now, cross-multiply and solve for x:
3x = 27
Divide both sides by 3:
x = 9
So, it would take approximately 9 hours for 9 towels to dry under similar conditions. Keep in mind that this is a simplified estimation, and actual drying time may vary.
Wow, that answer might impress even those of us who aced math class. It’s detailed, well-explained, and even comes with a formula for good measure. But there’s a catch: it’s entirely incorrect. So, what’s the real story here?
Technical Insights: Language Models and Hallucinations
One of the most captivating yet misleading aspects of generative AI models is their semblance of understanding text. It’s easy to think these models truly “comprehend” what they are reading or generating, but the reality is far more rudimentary. In essence, these models are glorified statistical engines that use probability to predict the next word in a sequence. They are trained on massive datasets and employ complex algorithms, but their “understanding” is merely a byproduct of mathematical calculations. These models analyze the contextual relationship between words and their likely successors based on the data they’ve been trained on. They do not grasp meaning, context, or nuance (read: clothesline) in the way a human does; they are simply playing a high-level game of “fill-in-the-blank,” guided by probabilities. In this case the missing understanding is how a clothesline actually facilitates drying. This lack of true comprehension is a critical factor in why generative AI can produce outputs that are verbose yet hollow, or even outright incorrect — much like the promises of a wily used car salesman.
2. Master of Illusion: Convincing but Misleading
A used car salesman knows how to make a lemon look like a treasure. Similarly, AI models can produce results that seem convincing but are ultimately misleading. In a striking example that underscores the potential hazards of generative AI in professional settings, consider the case of Roberto Mata and his legal battle with Avianca Airlines. Mata’s lawyers, led by Steven Schwartz, used ChatGPT to conduct legal research for their court filings. ChatGPT, assured them of the existence of legal precedents that, as it turned out, were entirely fabricated. Mata’s lawyers cited at least six other cases to show precedent, including Varghese v. China Southern Airlines and Shaboon v. Egypt Air — but the court found that the cases didn’t exist and had “bogus judicial decisions with bogus quotes and bogus internal citations,”. These “bogus judicial decisions” led to a potentially career-damaging scenario, inviting federal sanctions and raising questions about the ethics and reliability of using AI in legal practice
Technical Insights: The Black Box Problem
The complex neural network architectures make it difficult to understand why a particular output was generated, a phenomenon often referred to as the “black box” problem. While these models can perform tasks with remarkable accuracy, understanding how they arrive at specific decisions or outputs is a labyrinthine endeavor. This lack of transparency poses significant ethical and practical concerns, especially in high-stakes domains like healthcare, finance, and law. In this case, ChatGPT not only understands the concept of a legal brief but also knows how it should be structured, complete with reference citations and formatting. As a result, it had no trouble creating faux briefs that appeared legitimate, even to a seasoned attorney — until someone actually tried to look up those cases in a legal database. This raises a challenge: when errors like these occur, it becomes difficult to both improve the model and hold it accountable. The black box nature of AI not only hampers our ability to fully trust these systems but also stands as a roadblock to regulatory compliance and the broader adoption of AI in sensitive areas.
3. Tailored but Impersonal
Both used car salesmen and AI models aim to provide a personalized experience, but it often feels artificial and devoid of genuine human touch. Advanced chatbots and customer service AIs do their best to mimic cognitive empathy. They are programmed to recognize keywords or phrases that suggest the user’s emotional state, and they generate responses accordingly. For instance, if a user types ‘I’m frustrated,’ the AI might reply, ‘I’m sorry to hear you’re feeling this way. How can I assist you?’ While this gives the impression of understanding, it’s a far cry from true emotional connection.
Technical Insights: Why AI Falls Short of Clinical Empathy
Generative AI models often have pre-defined personalities, making their interactions feel scripted and less than genuine. While research is improving this aspect, AI is still far from achieving the deep emotional understanding that comes with clinical empathy. True clinical empathy often requires a level of shared experience, something AI can’t simulate. Consider support groups like Alcoholics Anonymous, where members introduce themselves as alcoholics to establish common ground. This shared experience is the bedrock of genuine empathy and support, something far beyond the reach of even the most advanced AI..
Solutions on the Horizon
The AI community is hard at work searching for solutions to these challenges. A promising development is Microsoft’s Autogen multi-agent model, which diverges from traditional generative AIs. Autogen uses specialized agents, each with a unique role — such as a writer, an editor, or a developer and a QA tester. These agents collaborate, creating a ‘checks and balances’ system that aims for more accurate and reliable outputs. For instance, a ‘writer’ agent could generate content, while an ‘editor’ agent ensures its accuracy, thereby reducing the likelihood of hallucinations.
As we run downhill into an era where AI is becoming increasingly sophisticated, we must brace ourselves for the quirks and hallucinations that are part and parcel of this technology. While advancements like Autogen offer a glimpse of a more reliable future, the journey is far from over. So, don’t be too surprised if you find yourself feeling like you’re haggling with an AI version of a used car salesman.