Meta MegaByte Could Supercharge AI Generation
‘Promising’ architecture could render Transformers obsolete
At a Glance
- AI researchers at Meta have proposed a way to improve AI content generation from larger sequences.
- The novel approach could replace tokenization, paving the way for improved generation speeds.
AI researchers from Facebook’s parent company Meta have proposed a novel way to speed up the generation of content for uses like natural language processing.
MegaByte, detailed in the recently-released paper, is designed to improve lengthier content generation. Systems like OpenAI’s ChatGPT can easily handle short outputs, but the longer or more complex the sequence, the worse the model’s performance becomes.
The MegaByte approach uses a multi-scale decoder architecture capable of modeling sequences of over one million bytes with end-to-end differentiability — meaning potentially better generation performance at a reduced running cost.
Meta’s researchers take issue with Transformer-based architecture. Developed by researchers at Google back in 2017, Transformer-based systems have since seen wide adoption for NLP tasks, paving the way for models and systems like ChatGPT, GPT-4 and BERT.
However, Meta's team argues that Transformer-based systems working on complex inputs like books or podcasts take up considerable amounts of compute to function. MegaByte, however, divides inputs and outputs into "patches" instead of individual tokens. Each patch gets its own localized response, which the model combines with other patches as a whole to create the final output.
Stay updated. Subscribe to the AI Business newsletter
MegaByte’s ‘patches’ approach negates self-attention scaling, or increased sequences, as calculations are performed in parallel, rather than sequential, which the researchers argue leads to faster results.
MegaByte “gives competitive language modeling results with subword models, which may allow byte-level models to replace tokenization,” the researchers suggest.
Meta’s newly proposed architecture received praise from none other than Andrej Karpathy, Tesla’s AI director, who described it as “promising.”
“Everyone should hope that we can throw away tokenization in large language models,” the Tesla AI chief said via Twitter. “Doing so naively creates (byte-level) sequences that are too long, so the devil is in the details.”
However, it’s early days for MegaByte, as Meta’s paper details the scale of experiments conducted using it are “far below those of state-of-the-art language models.”
Future research into MegaByte should explore scaling the architecture to larger models and datasets, the researchers propose.
About the Author
You May Also Like