ChatGPT Update Claims Reasoning Capabilities; Industry Reacts

OpenAI’s new o1 model is designed to complete more complex tasks and solve harder problems

Berenice Baker, Editor

September 20, 2024

8 Min Read
A person's hands using a keyboard with the image "ChatGPT" floating above
Getty images

OpenAI recently released a preview of its new AI model, o1, to ChatGPT users. Designed to complete more complex tasks and solve harder problems in science, coding and math, the company has made bold claims about its ability to think and reason.

Here’s what industry representatives had to say about whether o1, known in development as Strawberry, really does represent the next leap forward in generative AI.

Matt Hasan, CEO, aiRESULTS

The o1 model combines deep reinforcement learning, chain of thought (CoT) and a “tree of thoughts” approach, making it a significant step forward in AI reasoning. While it isn’t quite “thinking” like a human, its structured problem-solving capabilities are impressive.

During testing, o1 exhibited significantly fewer hallucinations than earlier models, especially on tasks requiring factual accuracy. This enhances o1's reliability for tasks where precise information is crucial.

The o1 model’s capacity to reason with data provides a more adaptable and intelligent approach to validation, particularly in sensitive areas such as healthcare.”

Eli Itzhaki, CEO and founder, Keyzoo

Phrases like “thinking” and “reasoning” sound impressive, but they don’t truly reflect what’s happening under the hood. AI models like ChatGPT operate through complex algorithms that predict responses based on patterns in the data they’ve been trained on. They simulate human-like conversation without actually understanding or experiencing anything. It’s more about advanced pattern recognition than real cognition.

Related:Integration Impasse

For businesses, this means AI can be excellent for specific tasks, like answering common customer questions or generating content ideas, but it’s not capable of making nuanced decisions that require genuine comprehension or empathy. The update might improve the model’s ability to sound more convincing, but we shouldn’t confuse that with actual human reasoning.

The question of whether it is the most "dangerous" AI to date is a matter of context. The real risk lies in overestimating what the technology can do and applying it in scenarios where a human touch is still necessary.

Ed Charbeneau, principal developer advocate, Progress

While the new o1 model is impressive and demonstrates characteristics of deep thought, the model itself has no independent will, consciousness or memory. The model cannot “think” outside of the context of the prompt it was given. The reasoning abilities of the model are impressive and appear humanlike, but it still falls in the category of chatbot or limited reasoner.

Related:Insights from Gartner’s Hype Cycle for Artificial Intelligence

The new o1 series of AI models was built for reasoning-heavy multi-operation tasks such as math, science and coding. Previous models would respond to a prompt with the first answer it generated resulting in hallucinations, sometimes with simple logical errors.

The o1 series models are designed to clean up logical errors by breaking down difficult tasks into smaller steps known as “chain of thought.” The chain of thought summary can be shown to understand the problem-solving processes used by the model to reach its final answer. By using a chain of thought, the model takes more time and computation to solve a problem while becoming more accurate, thus displaying less hallucination.

Steve Wilson, CPO at Exabeam

The biggest takeaway from OpenAI’s o1 is its ability to explain its reasoning. The new o1 model uses step-by-step reasoning, rather than relying solely on “next token” logic. For example, I posed a riddle to o1, asking it: “What has 18 legs and catches flies?” It responded: “A baseball team. A baseball team has nine players on the field, totaling 18 legs and they catch ‘flies’ – which are fly balls hit by the opposing team.” But what’s fascinating is there’s now an icon you can click to see how o1 thinks!

This new feature shows the user how it arrived at its conclusion. In this case, it stated its thought process of “addressing the inquiry,” “decoding the riddle” and “weighing the baseball team.” The concept of explainability has always been a huge topic and a major challenge for applications based on machine learning. This feels like a huge step forward.

This is an exciting release. What’s exciting about my initial testing isn’t so much that it’s going to score better on benchmarks but that it offers a level of explainability that has never been present in production AI/LLM models. Hallucinations have been the major limitation in the adoption of these models for many use cases. This shows a way forward.

People who say OpenAI has plateaued are fixated on the performance of next token predictor models but it is investing in quantum leaps in areas around reasoning and voice interaction – which will unlock many new use cases. When you start to combine these reasoning models with multi-modal vision models and voice interaction, we’re in for a radical shift in the next 12 months. Hold onto your hats. It’s going to be exciting!

Jean-Louis Quéguiner, founder and CEO at Gladia

The latest iteration of OpenAI's GPT-o1 model has generated significant interest, but upon closer analysis, it may not be as groundbreaking and "dangerous" as many expect.

GPT-o1 is not what we would typically consider a foundation model. Instead, it relies on a chain-of-thoughts approach, which has existed for some time. This method breaks down tasks into smaller components that can be addressed independently and later reassembled, a divide-and-conquer strategy in engineering.

While this allows for faster processing and more efficient parallelism, it doesn’t signify a fundamental shift in AI architecture. It’s more of an engineering feat rather than a genuine leap in foundational AI modeling. To put it simply, the improvements are due to optimizations – such as making the model smaller and faster – but there isn’t a novel core model driving these advancements.

Deon Nicholas, CEO at Forethought

I think this model is very powerful, but primarily for theoretical applications. It represents a new paradigm that will be useful for things like scientific research and makes it a little bit stronger at multi-agent tasks, but for most practical applications I think people will likely opt for the faster models like GPT-4o and then leverage separate RAG systems or agentic systems like LangChain to do these more complex reasoning tasks. This is because o1 can't perform searches or operate in a multi-modal fashion yet. But it's still a powerful leap forward in cognition.

It's a little bit underwhelming in practice, I think. Given all the buzz around "Strawberry" and "Q*," I was imagining something that would have been a significant leap forward in practical applications. That being said, I don't think this is a sign of plateauing, I think it's the first use case of a very powerful underlying technology.

I think the technology they presumably used to build o1, i.e. deep reinforcement learning with LLMs, is a bigger, more foundational, shift than the o1 model itself. We are inching closer to what may eventually be considered artificial general intelligence (AGI).

Scott Dylan, founder at NexaTech Ventures

OpenAI’s o1 model represents a significant leap forward in machine intelligence, but we need to be cautious when attributing human-like capabilities, such as “thinking” and “reasoning” to any AI. While these terms capture public attention, they oversimplify what the model actually does. In reality, o1 excels at pattern recognition, advanced prediction and probabilistic analysis – qualities that can simulate human reasoning in specific contexts, but it is not thinking in the way we understand consciousness or abstract thought.

Its danger lies not in its intelligence, but in how we wield it. Without proper guardrails and oversight, models like o1 could be misused in ways that amplify misinformation, deepen bias or destabilize industries. The focus needs to be on developing robust frameworks to ensure that as AI models grow in capability, they remain aligned with human values and ethical use cases.

Tharindu Fernando, full-stack developer at Net Speed Canada

While OpenAI claims that o1 spends more time "thinking" before responding, it's crucial to understand that this is not genuine thought in the human sense. Instead, it's a sophisticated simulation of reasoning processes. The model's ability to break down complex problems and approach them step-by-step is impressive, as evidenced by its performance in mathematics and coding benchmarks. However, we must remember that this is still based on pattern recognition and statistical prediction, not true understanding.

Regarding the question of whether o1 is the most "dangerous" model to date, I believe this characterization is overly sensational. Yes, the model's enhanced capabilities could potentially be misused, but OpenAI has also implemented improved safety measures. The new safety training approach leverages the model's reasoning capabilities to better adhere to guidelines, showing improved resistance to jailbreaks and enhanced bias mitigation.

Sujan Abraham, senior software engineer at Labelbox 

The "thinking before responding" capability of OpenAI will change the way users interact with AI chatbots. Strawberry will provide advanced problem-solving capabilities and thoughtful accurate responses. Since the new AI model will take its time to analyze user intent and make informed decisions, it could lead to responses being more meaningful and contextually appropriate. While the longer response times might frustrate some users, it also represents an evolution in AI technology by enhancing its trust in various use cases.

Strawberry’s improvements include:

  • Enhanced problem solving: OpenAI's Strawberry can handle more sophisticated queries with reasoning which leads to more accurate and thoughtful responses

  • Deeper understanding: Due to the thinking before responding approach, the responses are more contextually appropriate.

  • Broader applications: Reasoning skills can play a very crucial role in fields like healthcare and scientific research.

The negative results of Strawberry’s slow response time could be:

  • Frustration: Since most users are used to immediate answers, slow responses might frustrate them.

  • Hallucination concerns: As the discussions become deeper, it's very hard to ensure that the AI does not hallucinate.

Enhanced reasoning capabilities and thinking before-responding approach will set ChatGPT apart in handling complex queries. This also means that the technology will have more trust from enterprise-grade customers in fields like healthcare, finance and legal.

About the Author

Berenice Baker

Editor, Enter Quantum

Berenice is the editor of Enter Quantum, the companion website and exclusive content outlet for The Quantum Computing Summit. Enter Quantum informs quantum computing decision-makers and solutions creators with timely information, business applications and best practice to enable them to adopt the most effective quantum computing solution for their businesses. Berenice has a background in IT and 16 years’ experience as a technology journalist.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like