Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
Codestral Mamba can handle thousands of lines of code running on local devices, while Mathstral can solve complex math problems
French AI startup Mistral has published two new specialist language models designed to improve code generation and math reasoning.
Codestral Mamba is a small model capable of generating code outputs quickly.
Despite being 7 billion parameters in size, the model can generate answers to code-related queries at speed, even when handling longer input texts.
Codestral Mamba can handle up to 256k tokens, equal to between 50,000 to 200,000 lines of code though input lengths depend on the programming language and coding style.
“We expect it to be a great local code assistant,” a Mistral announcement stated as its small size makes it ideal for local coding applications such as real-time code autocompletion, syntax error detection and personalized coding assistance.
In terms of performance, Codestral Mamba outperforms rival code-focused models like Google’s CodeGemma and even models almost five times its size like Meta’s CodeLlama.
Credit: Mistral
It’s built using Mistral’s Mamba architecture, which differs from the traditional Transformer architecture found in most language models.
Instead of using attention mechanisms, a Mamba-based model uses selective state space models (SSMs), enabling it to process sequences linearly, meaning it can potentially handle much longer and larger inputs.
Codestral Mamba can be tested on Mistral’s la Plateforme alongside the larger Codestral 22B.
The code generation model is available under an Apache 2.0 license, so users can create their own proprietary software and offer the licensed code to customers. It can be downloaded from Hugging Face.
Mistral launched another AI model this week. MathΣtral, or Mathstral, which can handle advanced mathematical problems that require complex, multi-step logical reasoning.
The model, named in tribute to Archimedes, is designed to understand and solve complex math problems, making it a possible aid for academics and scientists.
Mathstral was developed in collaboration with Project Numina and achieves state-of-the-art reasoning across various benchmark tests, according to the company.
The model achieved scores of 56.6% on the MATH benchmark and 63.47% on the MMLU test. Mastral’s scores increase further when it’s given more inference-time computation, the Microsoft-backed startup said.
Credit: Mistral
“Mathstral is another example of the excellent performance/speed tradeoffs achieved when building models for specific purposes – a development philosophy we actively promote in la Plateforme, particularly with its new fine-tuning capabilities,” according to a Mistral announcement.
The model can be fine-tuned to improve its performance for a specific area of math or science.
Mathstral’s model weights can be accessed on HuggingFace.
You May Also Like