XGBoost 2.0: New Tool for Training Better AI Models on More Complex Data

The updated algorithm could help developers build better ranking and recommendation AI systems

October 5, 2023

3 Min Read

Getty Images

At a Glance

Explore updates to the popular XGBoost tool, now with support for federated learning and memory improvements.

XGBoost, used in the handling of large datasets for supervised machine learning, has been given an overhaul in the new version 2.0.

The open source offering allows developers to fine-tune various model parameters to optimize performance, and it works across a variety of programming languages, including Python, C++ and Java.

The updates to XGBoost could enable businesses to train better models on larger, more complex data. The tool gives developers new features and flexibility to improve performance for systems used for recommendations and ranking, meaning it could be beneficial to developers building systems that suggest products to shoppers in e-commerce, for example.

The newest version has improved external memory support, a new device parameter and support for quantile regression.

There are also some major bug fixes covering GPU memory allocation issues with categorical splits and a new thread-safe cache for using a different thread to perform garbage collection.

What is XGBoost?

XGBoost (eXtreme Gradient Boosting) is a popular algorithmic system used in the training of machine learning models. It uses a gradient-boosting framework to combine the predictions of multiple weak models to produce a stronger prediction.

In simple terms, think about looking at a hill: XGBoost is like considering how the steepness might change as you take your next step – effectively it is akin to planning ahead. The process is similar to the Newton-Raphson method in math, which is a clever way to find the bottom of the hill faster and more accurately.

XGBoost can be used commercially – it is available under an Apache 2.0 license so users can create their own proprietary software and offer the licensed code to customers.

What makes XGBoost so popular in machine learning development is it can run on a single machine or distributed processing frameworks, and it is integrated into several packages and data flow frameworks such as scikit-learn for Python and Apache Spark.

It uses a variety of features to improve its accuracy and increase its speed, like Newton Boosting and parallel tree structure boosting with sparsity.

XGBoost 2.0 updates and alterations

Here are the new updates:

Unified device parameter – The team behind the algorithm has essentially removed older CPU and GPU-specific parameters and instead made it simpler – users now have one unified parameter when running XGBoost 2.0.

Quantile regression – XGBoost now supports quantile regression, which involves minimizing the quantile loss (aka ‘pinball loss’).

Learning to rank – The XGBoost created a new implementation for the learning-to-rank task, commonly used in search systems or news feed-style applications like Facebook.

GPU-based approx tree method – New ability to run approx trees on GPUs.

Improved external memory support - Improved performance and memory usage for external memory/disk-based training, meaning lowered CPU footprints.

New features to PySpark interface – Updates include GPU-based prediction capabilities, improved logs for training and Python typing support.

Federated learning support – 2.0 now boasts support for vertical federated learning – or training models collaboratively without sharing data.

Export cut value – Users can now export quantile values for the hist tree method via Python or C packages.

For the full list, check out the updates on XGBoost’s GitHub page.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

XGBoost 2.0: New Tool for Training Better AI Models on More Complex Data

At a Glance

What is XGBoost?

XGBoost 2.0 updates and alterations

About the Author(s)

Latest News

Trending articles