Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!
December 19, 2023
TikTok parent ByteDance reportedly is violating OpenAI’s terms of service by using its tech to develop rival large language models.
According to the Verge, ByteDance is using OpenAI’s API to gather data to build its own foundation model, under the working name Project Seed. The Chinese company has been working on generative AI for some time, with its researchers creating powerful 3D generation models.
OpenAI’s rules for using its tech explicitly state that output from models like GPT-4 cannot be used to develop rival models. However, ByteDance is reportedly purchasing access to OpenAI’s tech via Microsoft - which has similar rules in place - and it has been regularly maxing out its API access.
ByteDance is alleged to have used the API for almost the entirety of Project Seed's development, including training and model evaluation.
The Verge got hold of employee chatter about it on Lark, ByteDance’s internal messaging platform, about how to “whitewash” evidence that the company has been using OpenAI’s tech for illicit purposes.
ByteDance developers, largely based in China, are alleged to have obfuscated their use of OpenAI’s API via data desensitization, where data is masked to protect it. This technique is usually used to protect business-sensitive information or personal data.
OpenAI told The Verge that ByteDance's ChatGPT account has since been suspended with an investigation ongoing.
“We use GPT to power products and features in non-China markets, but use our self-developed model to power Doubao, which is only available in China.”
Doubao is a conversational AI system built by ByteDance where users interact via images and text. According to ByteDance, a small group of its engineers used OpenAI’s API service for “an internal small experimental model which was never launched.”
The TikTok parent said that practice was “stopped immediately” back in April with a new internal requirement being introduced that text produced by GPT models should not be added to the training datasets of the company's self-developed models.
ByteDance then said its team conducted examinations and took measures to ensure its engineers were compliant, including conducting batch sampling and then compared the similarity of its labeled data to OpenAI’s results to “prevent inappropriate use by data annotators.”
“As of now, the engineering team uses the GPT APIs to a very limited extent during the evaluation/testing process, such as score benchmarking,” according to ByteDance.
Chinese tech giants like ByteDance as well as Baidu and Alibaba have rushed to build their own large language models, in the wake of ChatGPT's popularity. Last week, a new Chinese supercomputer for training AI models was launched to support local efforts.
Read more about:ChatGPT / Generative AI
Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.
You May Also Like