Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
Also, a Stanford study reveals AI plagiarism detectors are less accurate for non-native English writers
OpenAI has shut down its AI detector tool just six months after launch due to its “low rate of accuracy.”
The ChatGPT maker disclosed the shuttering of AI Text Classifier by quietly updating its blog post. Attempting to access the tool now directs users to an error page.
OpenAI said it is currently researching “more effective provenance techniques" for text. However, the company said it is committed to developing ways to determine if a piece of audio or visual content is AI-generated.
AI Text Classifier was a free tool designed to detect text generated by ChatGPT, as concerns arose among educators that the chatbot could be used to cheat on assignments.
Users would paste text into the tool and it would determine the likelihood of whether the content was created by AI. Inputs had to be at least 1,000 characters (150-250 words) to return reliable results.
OpenAI’s AI Text Classifier was easily fooled if a human author edited the AI-generated text. The startup acknowledged the limitations of AI Text Classifier upon its release in early February, saying it was “not fully reliable” but “significantly” better than predecessor tools such as GPT-2 Output Detector, published in February 2021.
The tool also struggled with languages other than English – a wider problem for GPT detectors detailed in a recent Stanford paper, which found the routine labeling of human-authored non-English content as AI-generated.
Stanford scientists analyzed seven GPT detector tools – Originality.ai, Quil.org, Sapling, OpenAI’s AI Text Classifier, Crossplag, GPTZero and ZeroGPT. They were evaluated against 91 human-authored essays written in English by non-native speakers (sourced from a Chinese educational forum) and 88 U.S. eighth-grade essays sourced from the Hewlett Foundation’s Automated Student Assessment Prize.
The tools nearly perfectly identified the eighth-grade essays as written by humans. However, the AI detection tools overall misclassified 61% of the English essays from Chinese writers as AI-generated. Breaking it down: All seven detectors unanimously identified 20% of Chinese essays as AI-authored; 98% were flagged as AI-generated by at least one detector.
Of the Chinese essays that were unanimously identified, the researchers found that they had significantly lower perplexity compared to the others, suggesting that GPT detectors may penalize non-native writers with limited linguistic expressions.
The Stanford researchers warned that overlooking the bias in GPT detector tools could lead to “the marginalization of non-native speakers in evaluative or educational settings.”
The paper is potentially the first to examine biases in so-called ChatGPT detectors. The team behind it said that further research is needed to address such biases and to “ensure a more equitable and secure digital landscape for all users.”
“Our results call for a broader conversation about the ethical implications of deploying ChatGPT content detectors and caution against their use in evaluative or educational settings, particularly when they may inadvertently penalize or exclude non-native English speakers from the global discourse,” according to the paper.
Read more about:
ChatGPT / Generative AIYou May Also Like