Scraping YouTube Videos to Train Smarter AI

A developer from Replicate creates a way to scrape YouTube to train Stable Diffusion XL

Ben Wodecki, Jr. Editor

November 9, 2023

4 Min Read
Image of multiple SpongeBob figures
Abstract representation of AI in the style of SpongeBobAI Business via YouTube x Replicate

At a Glance

  • A new method called YouTune leverages YouTube videos to fine-tune AI image generation models like Stable Diffusion.
  • The concept could enable businesses to customize models using rich public data – though there are potential legal headaches.

There is a new method for improving image generation models like Stable Diffusion – by using YouTube videos to fine-tune AI systems.

Enter YouTune, which was developed by Charlie Holtz, a hacker in residence at the open source model-making startup Replicate.

YouTune refines Stable Diffusion XL by training it with images from YouTube videos. Just input a video link, and the model tailors its image creation to that video's specific content.

Holtz was able to take a link to the SpongeBob SquarePants Movie trailer, paste it and generate images of fish in the style of SpongeBob.

The system downloads the video and takes a screenshot every 50 frames. The screenshots act as training data for the fine-tuning process.

After selecting the shots, it took an 11-minute wait for the output – with the whole process costing 45 cents to complete.

You can try the model out for yourself on Replicate – here’s a Krabby Patty (with cheese, Mr. Squidward) that we made earlier. This took just two minutes each to generate.

Holtz has made some quirky AI projects in the past – including Zoo, a playground for text-to-image models and Once Upon a Bot, which generates children’s stories.

To build YouTune, Holtz used OpenAI’s ChatGPT to write the script, prompting the GPT-4 version to create a Python script to take a screenshot of every 10 frames and save them as a jpg.

You can read the full ChatGPT conversation here: https://chat.openai.com/share/2e9003a8-8a87-439c-8189-d67beff7f980

Holtz said he made YouTube because he wanted to fine-tune using 'The Nightmare Before Christmas' film “but it was a bit of a pain putting the training images together.”

It doesn’t always work however; the prompt ‘A Krabby Patty with cheese’ may have generated the Patty above, but the YouTune-enabled Stable Diffusion XL created these monstrosities.

Business applications: Cost savings from public data

Fine-tuning AI models requires data. YouTube, the internet's largest video-sharing platform, represents a vast and diverse dataset. Businesses could use a system akin to YouTune to tailor AI models, in this instance, to recognize and interpret images that are specifically relevant to their industry or customer base.

There are also cost considerations – collecting large-labeled image datasets can prove expensive and time-consuming. Using publicly available YouTube videos provides a free source of varied training data.

Fine-tuning on specialized datasets can boost a model's recognition accuracy. Using YouTube videos could allow an AI system to learn from a wide range of angles, lighting conditions and contexts.

Of course, there are potential legal headaches about using YouTube videos to train a model without permission from the owner.

For example, ChatGPT wouldn’t let the Holtz scrape YouTube when building the script for YouTune – but the Replicate developer used a workaround to “trick” the system into thinking he was the creator of the video he wanted to scrape.

Innovative, yet seen elsewhere?

Bradley Shimmin, chief analyst for AI and data analytics at sister research firm Omdia, said YouTune is a “very cool” concept.

“It looks like the developers have automated what would normally be a manual process of extracting images from videos and using those to fine-tune stable diffusion’s image gen model.”

However, the Omdia analyst said the process seemed familiar.

“Sadly, it looks to me quite a bit like the work we're seeing from Google along the same lines using its first-party models, and conversely the same thing we're seeing from OpenAI to automate some of the more sticky processes that go into creating GenAI solutions," he said. “For example, both Google and OpenAI are building such functionality into their no/low-code platforms and APIs that basically do the same thing these intrepid developers do, only better and as an integral component of their broader platform.”

“Still, such early innovation from the broader ecosystem should be applauded and supported, as it helps to mature the overall market while also giving developers a choice between the platform player's monolithic solution and a solution built with early efforts like this.”

How to implement YouTune

Here's a step-by-step guide straight from the YouTune GitHub page.

1. Clone the below repo and setup and activate a virtualenv:

https://github.com/cbh123/youtune

python3 -m pip install virtualenv

python3 -m virtualenv venv

source venv/bin/activate

2. Install the dependencies:

pip install -r requirements.txt

3. Make a Replicate account and set your token:

export REPLICATE_API_TOKEN=<token>

4. Run it

python tune.py <youtube-url>

Holtz provides a video explainer in a post on X (Twitter). There’s also another explainer via Loom.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like