SAN JOSE, CA – Beneath any powerful and useful AI product are hundreds of hours of data cleaning and legwork, not to mention an even more powerful hardware infrastructure. From cloud technology to data storage, ensuring your hardware is able to compute, store, and mobilise your training dataset in a timely manner is crucial.
To find out more about how companies can get their AI solution to market effectively and in a timely fashion, we sat down with Barbara Murphy, VP of Marketing for WekaIO. WekaIO helps companies manage scale and future proof their data centers so that they can solve real problems that impact the world. Their core product, WekaIO Matrix™, claims to be the world’s fastest shared parallel file system, leapfrogging legacy storage infrastructures by delivering simplicity, scale, and faster performance for a fraction of the cost.
Why is cloud technology so vital to the deployment of AI? What are the challenges?
“The cloud has proven to be a key component of the AI journey, particularly for companies entering the space for the first time. Any AI project starts out with the early development of models. It is an ongoing development process that takes several iterations before the model is ready to put into test and production.”
“During this early development phase, the cloud plays a significant role by allowing data scientists to experiment with models and training, without the heavy investment of dedicated on-premises GPU servers. Many leading cloud services such AWS SageMaker have readymade frameworks that make the on-boarding of new AI projects much simpler.”
“The challenge the cloud presents is getting the data to the compute. If the data training catalog is large, (e.g. autonomous vehicle training) it is a major effort to get the data sets in and out of the cloud. Often times for these large data-set models, the project moves on-premises for the production phase to avoid the heavy cost of moving and storing data in the cloud.”
What are the major infrastructure obstacles to making AI work for industry?
“GPUs have shrunk the processing power of tens of CPU servers into a single GPU server delivering massively parallel processing and dramatically improving machine learning cycles. However, the shared storage systems being leveraged to support AI workloads are utilizing technology developed in the 1980s when networks were slow.”
“If your data set does not fit inside the local storage on a single GPU server then scaling the AI workload is a nightmare. NFS, the predominant protocol for data sharing is limited to about 1.5GB/second in bandwidth while a single GPU server can easily consume 10x that throughput. GPU workloads demand a low latency, highly parallel I/O pattern to ensure that the AI workloads are operating at full bandwidth.”
What does competitive advantage look like in the context of AI technology?
“The two key elements to competitive advantage in AI are being in first place to get product to market – whether it is a digital MRI machine, an autonomous taxi or an automated trucking fleet. The faster you can train your model, the quicker you will get to market and the better chance to achieve the number one position. This means that every minute, hour and day counts. Training models for autonomous vehicles can take weeks, so reducing that down to days has a huge impact on the bottom line. This will require that the infrastructure has optimal performance and the lowest latency (the secret time killer of machine learning projects). Technologies like InfiniBand, NVMe, multi-node GPUs and fast data access are critical in the race to win.”
“The other key element is the size of the training model data set, because more data means better models and hence faster time to production. Andrew Ng, VP and Chief Scientist of Baidu; Co-Chairman and Co-Founder of Coursera; Adjunct Professor at Stanford University; and online learning pioneer, put it really well: ‘It’s not who has the best algorithm that wins. It’s who has the most data.’ The larger the training data set, the more accurate the training model will be and the faster it can get to market. Large data sets need a shared storage solution that offer massively high bandwidth, low latency and parallel access so that all GPUS are kept fully busy.”
What does AI mean in practice for enterprises today?
“AI is still in its infancy, but numerous studies have shown that companies who are adopting AI are reducing costs, improving efficiency and delivering bottom line profit to the company. AI can help with problems as basic as setting a maintenance schedule for a factory floor, all the way to targeting the right product to potential buyers and improving sales closure rates.”
“Just look at a company like gong.io that is helping salespeople use the right language to improve the rate of sales closure. No business is too small to utilize readily available AI-powered tools or develop its own AI strategies.”
How can enterprises start thinking about AI solutions in relation to their own business problems?
“AI is not something to be feared, it is the new wave of automation. The first wave of automation was on the production line, making machines that could do (frankly boring) physical tasks as well as a human. AI is simply extending that concept to tasks that are powered by the mind instead of the physical body. AI is a companion to free humans from mundane tasks, so they can focus on their creativity.”
“It is important that we position AI as an enabler and not something to be feared, just like computers did not result in mass unemployment, they simply let humans focus on bigger things.”