‘Promising’ architecture could render Transformers obsolete

Ben Wodecki, Jr. Editor

June 5, 2023

2 Min Read

At a Glance

  • AI researchers at Meta have proposed a way to improve AI content generation from larger sequences.
  • The novel approach could replace tokenization, paving the way for improved generation speeds.

AI researchers from Facebook’s parent company Meta have proposed a novel way to speed up the generation of content for uses like natural language processing.

MegaByte, detailed in the recently-released paper, is designed to improve lengthier content generation. Systems like OpenAI’s ChatGPT can easily handle short outputs, but the longer or more complex the sequence, the worse the model’s performance becomes.

The MegaByte approach uses a multi-scale decoder architecture capable of modeling sequences of over one million bytes with end-to-end differentiability — meaning potentially better generation performance at a reduced running cost.

Meta’s researchers take issue with Transformer-based architecture. Developed by researchers at Google back in 2017, Transformer-based systems have since seen wide adoption for NLP tasks, paving the way for models and systems like ChatGPT, GPT-4 and BERT.

However, Meta's team argues that Transformer-based systems working on complex inputs like books or podcasts take up considerable amounts of compute to function. MegaByte, however, divides inputs and outputs into "patches" instead of individual tokens. Each patch gets its own localized response, which the model combines with other patches as a whole to create the final output.

Stay updated. Subscribe to the AI Business newsletter

MegaByte’s ‘patches’ approach negates self-attention scaling, or increased sequences, as calculations are performed in parallel, rather than sequential, which the researchers argue leads to faster results.

MegaByte “gives competitive language modeling results with subword models, which may allow byte-level models to replace tokenization,” the researchers suggest.

Meta’s newly proposed architecture received praise from none other than Andrej Karpathy, Tesla’s AI director, who described it as “promising.”

“Everyone should hope that we can throw away tokenization in large language models,” the Tesla AI chief said via Twitter. “Doing so naively creates (byte-level) sequences that are too long, so the devil is in the details.”

However, it’s early days for MegaByte, as Meta’s paper details the scale of experiments conducted using it are “far below those of state-of-the-art language models.”

Future research into MegaByte should explore scaling the architecture to larger models and datasets, the researchers propose.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like