Artificial intelligence and machine learning (AI/ML) are continuing to evolve at a rapid pace, offering unprecedented benefits to numerous technology markets, including IoT, advertising, search, voice recognition, and connected technologies that leverage 5G. The rapid growth in functionality is driven by advances in AI training and inference capabilities, which are in turn driving revolutionary changes in AI computing hardware and software.
In order to maintain the torrid pace of AI/ML improvement that the market continues to demand, rapid performance increases are needed in key enabling technologies. Memory solutions are one such area of critical focus. Complex AI/ML algorithms, such as those required for speech recognition and processing, image classification, and advanced driver-assistance systems (ADAS), need to process massive amounts of data, which requires enormous memory bandwidth. As AI/ML applications continue to evolve, system designers are increasingly turning to the highest-performance memory solutions such as High Bandwidth Memory (HBM) and Graphics Double Data SDRAM (GDDR6) to enable the next phase of AI advancement.
HBM2E and GDDR6: Meeting the dual needs of AI/ML
When discussing AI/ML, there are two use cases to consider: training and inference. Training is the process of creating a machine learning model by presenting sample inputs and providing feedback on how the model should respond to those inputs. State-of-the-art neural networks contain upwards of 100 billion parameters and require enormous amounts of training data. Especially for the largest neural networks, training often occurs in data centers. The advantage of using data centers is that they house large collections of the fastest hardware, enabling training to be split up among many specialized machine learning accelerators to reduce training time. Communication bottlenecks and excess memory capacity can arise as more processors are added to the training task, resulting in a point of diminishing returns beyond some number of processing engines (but that’s for another article). With a growing need to move data more quickly and efficiently during AI/ML training, memory solutions must provide both high bandwidth and high capacity.
Inference refers to the process of using a trained machine learning model to make a prediction or a decision. Inference is increasingly moving to the edges of the network and into the endpoints themselves, as in the case of autonomous vehicles and smart devices. Performing inference in the endpoints and at the edge of the network has two key advantages: (i) the latency to get an answer is much lower because decisions are being made close to the data, and (ii) there is significant energy savings because (a potentially large amount of) data doesn’t need to be shipped long distances across the network to a data center, and a result shipped all the way back again. Some solutions choose to use on-chip memory (SRAM) to hold model parameters instead of using external memory. On-chip memory offers extremely high bandwidth, but suffers from limited capacity. For larger neural network models, external memory must be used to hold all the model parameters. As model sizes continue to grow, there’s more of a need for external memory than ever before in order to balance the needs of high capacity, high bandwidth, power-efficiency, and cost.
Two memory solutions emerge as particularly well suited for the range of needs of AI/ML: HBM2E (the latest generation of HBM) and GDDR6 (the latest generation of GDDR). In this article, we’ll explore some of the key benefits and design considerations for both.
HBM2E: Memory solution for AI training applications
Introduced in 2013, HBM is a high-performance 3D-stacked DRAM architecture that passes data across an extremely wide (1024 bits today) and relatively slow (2Gbps in HBM2) interface to achieve high memory bandwidth while also improving power-efficiency. Following successful adoption of HBM in the market, the architecture has continued to evolve and in late 2018 the HBM2E specification was announced, to support increased bandwidth and capacity.
The benefits of using HBM2E for AI training is that it offers higher memory bandwidth and capacity than previous generations of HBM, enabling future AI training hardware to be fed with even more data, and to store larger training sets. Four HBM2E stacks connected to a processor will deliver over 1.6 Terabytes per second (TB/s) of bandwidth and up to 96GB of memory capacity, a 60% increase in bandwidth and three times the capacity of the original HBM2 DRAMs. Providing these capabilities in a 3D stacked memory, enables both high bandwidth and high capacity to be achieved in a footprint that’s much more compact than would be possible with other memory technologies. What’s more, by keeping data rates relatively low and the memory close to the processor, the power spent moving data from the memory devices to the processor is kept low, offering much better power-efficiency and making it a great candidate for data center environments.
The trade-off for systems that use HBM is increased design complexity and costs. The large number of connections require an interposer to connect the processor to the HBM DRAMs, and an additional substrate that sits between the interposer and the main PCB. These additional components must be designed, manufactured, and assembled, adding cost. The HBM DRAMs have additional costs as well due to stacking of the DRAM components together with an additional piece of silicon called the base layer, all of which are stacked together into a single DRAM device that is then placed onto the interposer. These higher costs, coupled with the newer, more difficult system design challenges, have resulted in a lower rate of HBM adoption compared to competing solutions like GDDR that have been in use in the industry for much longer. In many systems (especially the highest-performing AI training systems) the benefits to performance and total cost of ownership (TCO) outweigh these additional costs, making HBM a great choice for training some of the most advanced AI solutions.
Overall, the benefits of HBM2E make it the superior choice for AI training applications, and there is a strong track record and increasing familiarity and experience with HBM being implemented into AI processors such as NVIDIA’s Tesla V100 and the second-generation Google TPU. What makes it perfect for AI training applications is that the performance is remarkable, and higher implementation and manufacturing costs can be traded off against savings of training time, board space, and power. Lower power in turn reduces cooling costs, further benefitting TCO.
GDDR6: Memory solution for AI inference
Graphics DDR SDRAM (GDDR) has been around for about 20 years, and was created to meet the growing demands of the burgeoning graphics market. Driven by the graphics and gaming market’s insatiable demand for more memory bandwidth, multiple generations of GDDR memories have been created that provide much higher bandwidth than DDR DRAMs, which are used in computing applications. The primary difference between GDDR and HBM DRAMs is that GDDR DRAMs have a much narrower interface (only 32 bits) running at a much higher data rate (up to 18Gbps today). The latest generation of GDDR DRAMs, GDDR6, provides up to 72GB/s of memory bandwidth per device, more than double the bandwidth of the previous generation GDDR5 DRAMs.
The benefits of using GDDR for AI applications is that it offers a great combination of bandwidth, capacity, latency and power. To keep pace with the increasing demands of newer AI platforms, GDDR6 lowers the operating voltage to 1.35V from 1.5V for greater power efficiency, more than doubles the device bandwidth (72GB/s vs. 32GB/s) and doubles capacity (16GB vs. 8 GB) of GDDR5 memory.
Another benefit of GDDR6 is that it relies on the same manufacturing techniques used to produce standard DDR-type DRAMs. GDDR6 DRAMs use conventional packages, and don’t require the use of an interposer or an additional substrate, allowing them to be tested and integrated using the same techniques as mainstream DRAMs. Leveraging existing infrastructure, manufacturing processes, and design experience reduces cost and implementation complexity. These benefits have allowed the GDDR market to grow in the years since its initial introduction into the market.
The biggest challenge with designing GDDR6 memory systems is related to the much higher data rates that the devices run at. GDDR6 DRAMs move data at the highest speeds compared to other DRAMs, up to 18Gbps. Maintaining good signal integrity (SI) at speeds of up to 18 Gbps, and at lower voltages to conserve power, requires significant attention to detail and engineering expertise. Designers must meet tighter timing and voltage margins, while dealing with effects such as crosstalk that become harder at higher data rates. Furthermore, interference between the circuits that move data to and from the DRAM with other circuits on the processor, crowded packages and boards, and the challenge of minimizing cost require a strong co-design methodology that balances these and other concerns in order to maintain high-quality SI. But because GDDR6 relies on established methodologies and infrastructure, there is a good base of knowledge within the industry. Coupled with the relatively lower cost, GDDR6 is an excellent choice for AI solutions seeking to balance performance, capacity, and power.
HBM2E and GDDR6 – partners in AI
In summary, it’s clear that both of these memory solutions will continue to play a vital role in advancing AI in the future. The choice of memory solution depends on balancing the needs of the AI application: whether the solution is being used for training and/or inference, the amount of memory bandwidth and capacity required, and the cost and complexity that is acceptable in the design. HBM2E is a good choice for systems that have the highest performance and power-efficiency requirements that can also tolerate increased cost and design complexity. And for systems that require more of a balance between performance, power-efficiency, memory capacity, and cost/complexity, GDDR6 is a good choice. Both are ideal for helping process the growing volumes of data that will be generated by IoT devices and 5G networks in the future, and will help to drive future advances in AI and machine learning going forward.
Steven Woo is fellow and distinguished inventor at Rambus