Falcon 180B: The Powerful Open Source AI Model … That Lacks Guardrails

Falcon 180B beats GPT 3.5, Llama 2 and rivals Google’s PaLM-2

Ben Wodecki, Jr. Editor

September 8, 2023

2 Min Read
image of a falcon
Falcon 180B is 2.5 times larger than Llama 2 and was trained with 4x more compute.Made using RunwayML (Prompt: A Falcon, dynamic lighting, intricate detail, summer vibrancy, cinematic)

At a Glance

  • A new large version of the Falcon 40B model has dropped – it’s powerful but is susceptible to 'problematic' outputs.

The team behind the Falcon 40B open source model has released a souped-up version that is more than four times larger – but lacks alignment guardrails.

The Technology Innovation Institute (TII) published Falcon 180B on Hugging Face this week. It was trained on 3.5 trillion tokens from TII’s RefinedWeb dataset.

Falcon 180B achieves state-of-the-art results across natural language tasks – it topped the Hugging Face leaderboard for pre-trained open access models, scoring higher than proprietary models like Google's PaLM-2.

You can use Falcon 180B for commercial applications – but under very restrictive conditions. The full license is here.

TII released a base version and a version fine-tuned on chat and instruction data.

You can try out the model for yourself via the Falcon Chat Demo Space.

One ‘slight’ issue with Falcon 180B

The TII team wants developers to further build upon the base Falcon 180B model and create “even better instruct/chat versions.”

The model does have one major flaw, however, as it lacks alignment guardrails. Falcon 180B has not undergone any advanced tuning or alignment so it can produce what TII calls "problematic” outputs – especially if prompted to do so.

The base version also lacks a prompt format – meaning on its own, the base Falcon 180B will not generate conversational responses.

Related:Meet TinyLlama: The 550MB AI Model Trained on 3 Trillion Tokens

How does Falcon 180B perform?

Falcon 180B outperformed Meta Llama 2 and OpenAI’s GPT 3.5 on the MMLU benchmark test.

The model was about on par with Google's PaLM 2-Large on various tests including HellaSwag, WebQuestions and Winogrande.

But it was on the Hugging Face Leaderboard that Falcon 180B shone – becoming the highest-scoring openly released pre-trained large language model with a score of 68.74. Meta's Llama 2 previously held the top spot with a score of 67.35.

Stay updated. Subscribe to the AI Business newsletter.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like