December 15, 2021
Artificial intelligence and machine learning are rapidly being adopted into organizations of all shapes and sizes.
It is not hard to see why; these technologies promise remarkable benefits and opportunities to businesses across all industries. Indeed, IDC forecasts global spending on AI systems will grow from $85.3 billion in 2021 to more than $204 billion in 2025.
Huge pools of unstructured data – often in video, image, text, and voice format – lie behind AI and ML tools and are required in order for these technologies to operate.
With the advent of AI and ML workloads, applications demand faster and faster access to considerable volumes of data. And that data is found anywhere and everywhere: in the cloud, at the edge and on-premises.
To support the growing adoption of AI, ML and other emerging technology, organizations need to take a fresh look at their data storage and management infrastructure.
Low latency, the ability to support different types and sizes of payloads, and the ability to scale linearly are essential to power-intensive workloads such as AI and ML. A new approach to data delivery is needed: one that is application-centric rather than location- or technology-centric.
Not all workloads are equal
AI and ML workloads are not all the same. An organization might sometimes be dealing with just a few tens of terabytes, while other workloads involve multiple petabytes.
A data storage and management solution that can handle varying types of workloads, both small and large files, is critical.
Not all solutions are designed for vast files, just as not all can handle very small ones. The trick is finding one that can handle both in a flexible manner.
The scalability factor
AI and ML algorithms rely on enormous datasets to allow for proper training of underlying models that ensure accuracy and speed.
Organizations want to grow in terms of capacity and performance but are often hampered by traditional storage solutions which cannot scale linearly.
For AI/ML workloads to succeed, there needs to be a storage solution in place that can scale infinitely and without disruption as the datasets grow.
Object storage is the answer here. While legacy file and block storage solutions can only scale up to a few hundred terabytes, object storage was designed to overcome this limitation with the ability to scale limitlessly, elastically, and seamlessly based on demand.
Object storage is made up of objects, which include the data itself, associated metadata and a globally unique identifier (instead of a file name and a file path), arranged in a flat address space.
This approach removes the complexity and scalability challenges of a hierarchical file system based on complex file paths.
What about performance?
With many traditional storage solutions, scaling capacity comes at the expense of performance. So, when an organization needs to scale linearly in terms of capacity, performance tends to plateau or even decline.
For AI and ML workloads, fast access to data is fundamental; they, therefore, require storage solutions that can scale linearly in terms of performance as well as capacity.
Traditional storage organizes files into a hierarchy, with directories and sub-directories. While this architecture is effective for small volumes of data, performance suffers beyond a certain capacity due to system bottlenecks and limitations with file lookup tables.
However, object storage consists of an unlimited flat namespace that can scale to petabytes and beyond by simply adding additional nodes.
With this approach, both performance and capacity can scale seamlessly and independently.
Ticking all the boxes
As AI and ML are increasingly adopted into the enterprise, IT teams must take a fresh look at their storage strategy and ensure they have a future-proof solution that enables them to support their businesses’ AI/ML projects and that can be easily set up, run and scaled.
Some of the enterprise-grade object storage software on the market today is purpose-built for the requirements of AI/ML training. Notably, organizations can begin their initiatives on a small scale, on a single server, and easily scale both capacity and performance, as needed.
As well as providing the performance required, fast object storage enables the flexibility to access and process data anywhere – whether at the edge, the cloud, or the core data center – and delivers complete data lifecycle management across multiple clouds.
For enterprise AI/ML projects, fast object storage ticks all the boxes: low latency, support for varying types and sizes of payloads and the ability to scale linearly in both capacity and performance.
Candida Valois is a field CTO for Scality. Valois is an IT specialist with more than 20 years of IT experience in architecture, development of software, services, and sales for various industries at companies such as IBM and EMC.
You May Also Like