Stanford’s Spellburst: Editing Generated Images Through Code
GPT-4-powered large language model helps solve an artist's problem of not being able to fine-tune a generated image
At a Glance
- Stanford's Spellburst is a large language model that generates images from text, whose code can be edited for fine-tuning.
Researchers at Stanford University and Replit have unveiled Spellburst, a new large language model whose generated images can be edited by revising lines of code.
Spellburst allows artists to input an initial natural language prompt and then once an image is generated, edit the output’s code to create more refined outputs – effectively giving users fine-tuning for image generation.
Spellburt users could use the prompt ‘a stained glass image of a beautiful, bright bouquet of roses´ and if the flowers were too pink or the stained glass looked off-color, open a panel of dynamic sliders generated using the previous prompt to change any aspect of the image. Users could even merge different versions of outputs.
Spellburt was built with the large language model GPT-4 from OpenAI
The team at Stanford built the model after interviewing digital artists who expressed creative frustrations. Stanford's team said Spellburst can speed up the time-consuming and difficult process of coding art.
“A large language model can give you a good starting point,” said Hariharan Subramonyam, assistant professor at the Graduate School of Education and a faculty fellow at the Stanford Institute for Human-Centered AI, in a blog post.
“But when the artist wants to explore different textures, different colors or patterns, at that point they want finer control, which large language models can’t provide. Spellburst essentially helps artists seamlessly switch between the semantic space and the code.”
When building Spellburst, Stanford researchers interviewed creative coders on how they develop their concepts, creative workflows and challenges. Expert generative artists were later allowed to test Spellburst.
“The feedback was overall very positive,” Subramonyam said. “The large language model helps artists bridge from semantic space to code faster, but it also helps them explore many different variations and take larger creative leaps.”
Spellburst does have its limitations – it does not always get prompts right and in some instances merging versions caused issues. Stanford's use of a small sample of artists providing feedback also did not represent the full generative artist community, the team behind the model noted.
When to access Spellburst large language model
Spellburst is not accessible at the time of writing. The team behind it is studying it further.
Stanford is planning on launching the tool as open source “later this year.”
A detailed breakdown of the Spellburst model can be found in the paper: Spellburst: A Node-based Interface for Exploratory Creative Coding with Natural Language Prompts, authored by Tyler Angert from Replit and Stanford's Miroslav Ivan Suzara, Jenny Han, Christopher Lawrence Pondoc and Subramonyam.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like