OpenAI’s Code Interpreter Lets ChatGPT Play Data Scientist

New tool lets ChatGPT execute code, not just generate it

3 Min Read
Credit: OpenAI, AI Business

At a Glance

  • ChatGPT can now execute code, not just generate it.
  • This capability lets ChatGPT do detailed data analytics and visualization without users writing any code.
  • Wharton professor Ethan Mollick demonstrates how he used Code Interpreter to merge, clean and analyze two datasets.

ChatGPT has long been able to generate code and code-like text, but could not execute code or programs – until now. OpenAI has unveiled Code Interpreter, which gives ChatGPT the ability to execute code.

While on the surface this seems like a tool that would mainly interest developers, Code Interpreter is enabling an important use case: the ability to do detailed data analytics and visualization without users writing a snippet of code.

“You are essentially posing questions directly to the data,” wrote solution architect Dennis Layton in Medium. “Moreover, it is capable of handling much larger datasets.”

“Why is this capability so important?” he asked. “It’s because it provides an easy to use way to combine the use of Large Language Models (LLM) like GPT with more traditional programming capabilities without the need to write programming code or set up an environment to run such code.”

Added Wharton associate professor Ethan Mollick in a blog post: “Code Interpreter is an impressive data scientist. … It is operating at a very advanced level, automating a lot of the complexity of quantitative analysis, and capable of very sophisticated approaches to data.”

On Twitter, Coalition's Research Vice President Tiago Henriques said Code interpreter is “insane for data analysis.” It could be used to analyze cybersecurity logs or create “quick” reports.

Related:Research: Can ChatGPT Do Data Science?

View post on X

How it works

OpenAI is rolling out its in-house Code Interpreter tool as a plugin to subscribers of its premium offering, ChatGPT Plus.

Premium users can write and execute Python code. Users can also upload a file and ask ChatGPT to analyze data, create charts, edit files and perform math.

Here’s an example of Code Interpreter on ChatGPT Plus creating code for a simple word-guessing game based on the lyrics of "Never Gonna Give You Up" by Rick Astley.

Files uploaded to ChatGPT for use with Code Interpreter will only remain in one session. Exit a chat session and the file will be removed. Files up to 100MB in size can be uploaded to ChatGPT Plus.

ChatGPT isn’t equipped with an interactive terminal so it cannot run code that involves direct user input, meaning developers would have to take the generated output and run it in their local Python environment.

Data analysis example

Mollick said the addition of Code Interpreter helps address a number of problems in prior versions of ChatGPT.

“Specifically, it gives the AI a general-purpose toolbox to solve problems (by writing code in Python), a large memory to work with … and integrates that toolbox into the AI in ways that play to the strengths of large language models.”

  • It lets ChatGPT do very complex math and do more accurate work with words, like counting words in a paragraph.

  • It lowers hallucination and confabulation rates. “When the AI directly works with Python code, the code helps keep it ‘honest’ since Python generates errors if the code is not correct. And as the code manipulates the data, rather than the LLM itself, there are no errors inserted into the data by the AI.” ChatGPT still hallucinates, but less so.

As an example, Mollick uploaded two public domain datasets of superheroes and their powers. His prompt: “Here is some data on superhero powers. Look through it and tell me what you find.”

Next, Mollick asked ChatGPT to merge the data and clean it. ChatGPT corrects its own errors, but he said people should still check the results.

Then, Mollick asked ChatGPT to analyze the data. His prompt: “I am interested in doing some predictive modelling, where we can predict what powers a hero might have based on other factors. How should we approach this?”

In response, ChatGPT built a Random Forest classifier. However, Mollick disagrees with its decision to impute missing data by using the means for numerical data. “I would have dropped the data instead, but I could ask the AI to change its approach, or discuss alternate options.”

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Deborah Yao

Editor

Deborah Yao runs the day-to-day operations of AI Business. She is a Stanford grad who has worked at Amazon, Wharton School and Associated Press.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like