Data Is The Lifeblood of AI - Storage Is Its Veins

Data Is The Lifeblood of AI - Storage Is Its Veins

Ciarán Daly

November 22, 2017

6 Min Read

Much of the excitement surrounding AI undoubtedly comes from the incredible analytical improvements stemming from machine learning solutions. The automation of knowledge work using AI, for example, is listed as #2 in the top 12 most disruptive tech trends, and is predicted to have an impact of between $5 and $7 trillion dollars by 2025 in a McKinsey report (Disruptive technologies: Advances that will transform life, business, and the global economy).

Big, headline-grabbing innovations, from Sophia to Tesla's autonomous vehicles, are frequently at the centre of the AI conversation - with good reason, of course. However, this conversation rarely focuses on one of the most crucial elements powering these technologies; namely, the hardware.

"While the conversation around AI and ML solutions tends to focus on analytics software, a new, comprehensive set of purpose-built infrastructure software and hardware support is essential to capitalize on the AI revolution," argues Par Botes, VP of Product for Pure Storage, a company offering an end-to-end, cloud-connected data platform that aims to drive business and IT transformation. "With processing power at both the GPU and storage layer a prerequisite of any AI advancement, it is imperative to recognize the framework on which nearly all AI functionality is conducted."

Having recently won the AIconics award for Best Innovation in AI Hardware for FlashBlade, a cloud-ready flash storage platform designed for modern data analytics, Pure Storage are at the forefront of AI hardware innovation. Data storage and management are, after all, incredibly important to the successful deployment of AI and machine learning technologies, argues Brian Schwarz, VP Product Management for Flashblade. As VP of Product Management for Pure Storage, Brian Schwarz is an integral piece and early member of the team that created FlashBlade. Brian brings a wealth of experience in data center infrastructure and an expert on the intersection of AI and storage, with an emphasis on data-intensive and emerging workloads. Prior to Pure, Brian spent time at Cisco as well as Symantec, which he joined via the company's merger with Veritas Software.

[caption id="attachment_9502" align="aligncenter" width="1100"]Brian-Schwarz3-1100x733.jpg Brian Schwarz of Pure Storage[/caption]

"AI is the posterchild for unstructured data workloads," Schwarz argues. "It pushes the limits of data in ways never before seen. With ImageNet, for example, GPU systems required thousands of images to be processed per second. With the latest GPU architecture, the velocity of data jumps 2-3x higher. In addition, training data must be accessed often and at random, unpredictable times. It's an incredible strain on infrastructure."

"Data is the lifeblood of AI - and storage its veins"

So what makes AI different? "By wide consensus, AI is the confluence of three components that came together to fuel the revolution: a new algorithm known as deep learning, GPU processors, and big data. The third element of the revolution, big data, is arguably the most important because it holds the most value."

Storage technology, on the whole, lags behind the kinds of innovation needed to ensure AI is a success. "Storage systems available today are optimized for a design point that's different to what AI truly requires. They were built to store data, not deliver it at high velocity," Schwarz explains. "They are optimized for structured workloads - predictable, sequential access, not random patterns."

[caption id="attachment_9464" align="aligncenter" width="1100"]Screen-Shot-2017-07-20-at-10.52.42-AM-1100x395.png Compute required for deep learning training comparing Microsoft Resnet in 2015 to Google NMT in 2017. Compute delivered comparing Tesla M40 peak flops vs Tesla V100 peak flops. Source: Why The AI Industry Needs To Rethink Storage[/caption]

As the graph above shows, companies are facing an impending storage crisis, with data volumes growing faster than ever before. These extraordinary data volumes are a hugely invaluable resource, but they pose a challenge for storage. "The key distinguishing factor of deep learning is that accuracy continues to grow as the dataset grows. This means the more data, the more value," Schwarz says. "That data needs not only to be stored, but managed, consumed, and moved to where algorithms live - all at high speed. It is the lifeblood of AI, and storage its veins. It needs to be a modern, cutting-edge system, able to keep up."

"Machine learning code and framework is actually a very small fraction of the overall machine learning workflow. Much of the complexity is related to data and infrastructure. The challenge with any data infrastructure solution for AI is that it must be both simple and highly performant." Promising 10x greater speed and efficiency, flash could be the answer - and it's time that AI companies rethought their approach to storage as a result.

Pure Storage believe that, compared to their solutions, every other system on the market makes tradeoffs. "They are either easy to use but slow, or high performance but highly complex," argues Schwarz. "FlashBlade delivers high performance and cloud-like simplicity that customers have come to expect from infrastructure. The end value of FlashBlade to enterprise AIs is that the complexity of data infrastructure management goes away, while the computer systems are kept 100% busy with data."

Data and AI Strategy = As Important As Hardware

Hardware and infrastructure aren't the only obstacles to AI and data success, however. Equally important is how enterprises plan and leverage their long-term data strategy - and how they integrate that into their business strategy. It's this that companies need to prepare for if they're to stay ahead of the competition.

"AI is often discussed in the context of science fiction, but in practical applications, it will look a lot more democratic," Schwarz argues. "AI isn't just for high-tech, bleeding-edge use cases. It's not just poised to deliver futuristic, hypothetical technologies. It will be at the core of operations for every business across every industry - from retail to pro sports, space travel to healthcare, self-driving cars to streaming video."

Schwarz believes that the AI journey is a marathon - not a sprint. The greatest challenge to success, he says, is simply showing up to the race, prepared and ready to go. "Of course, a tremendous amount of preparation goes into running a marathon. In AI, the prep work involves collecting as much relevant data as possible, cleaning it up, labelling it, and preparing it for analysis."

Pure Storage at The AI Summit

Pure Storage have sponsored a number of AI Summit events, and will again be present at the AI Summit NYC next month on December 5-6. After a successful time for the company at the AI Summit San Francisco, they're excited to attend the next event. "We really believe that Pure Storage is one of the companies best positioned to accelerate this latest wave of AI, and the dozens of conversations we had on-site at the AI Summit San Francisco really reinforced both our confidence and excitement," Brian explains. "We spoke to dozens of people in the space and it's clear that we've done as many real-world deployments as anyone else out there."

"At a higher level, it was insightful and motivating to see the wide range of industries and use cases where AI has been deployed to drive innovation. Across healthcare, telecommunications, transportation and technology, we were exposed to futuristic ideas on how to apply deep neural networks and solve real problems. The attendees represented just an incredible mix of intellect and experience. It seemed like half the audience held a CxO title, while the other half had PhDs!"

For more information on The AI Summit NYC, click here

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like