The third instalment in Alphawave Semi’s series on High Bandwidth Memory (HBM4) delves into the benefits of custom HBM implementations. Building upon the previous discussions on HBM’s advantages over traditional memory technologies, particularly in AI training, deep learning, and scientific simulations, this article explores how custom solutions can provide a performance edge. As AI continues to drive the need for more innovative memory solutions, custom HBM allows for a closer integration with compute dies and custom logic, enhancing overall performance despite its complexity.
Custom HBM, particularly when paired with a die-to-die interface like the UCIe standard (Universal Chiplet Interconnect Express), allows for the tight integration of memory and compute dies. This configuration achieves extremely high bandwidth and low latency between components. The memory controller interacts directly with the HBM DRAM through a Through-Silicon-Via (TSV) PHY on the memory base die, with commands from the host being transmitted via a die-to-die interface using a streaming protocol. By reusing the die-to-die shoreline already in place on the main die for core-to-core or core-to-I/O connections, this method optimises design efficiency.
Alphawave Semi stands at the forefront of this development with its HBM4 memory controller portfolio, which recently made headlines with the industry’s first silicon-proven 3nm, 24 Gbps die-to-die UCIe IP subsystem, delivering 1.6 TB/s of bandwidth. The company’s ability to design and build custom ASIC dies in-house enables a seamless collaboration with customers to create tailored HBM solutions.
Custom HBM integration offers several benefits. First, it ensures that the memory is perfectly aligned with the specific needs of processors or AI accelerators, optimising bandwidth, reducing latency, or increasing memory layers. Alphawave’s highly configurable HBM4 memory controller supports a variety of parameters, making it adaptable to different workloads.
Additionally, 2.5D integration—where the processor die and HBM custom dies are placed side-by-side on an interposer—improves performance by enabling high-speed communication between the components. This method lowers latency and increases bandwidth, ensuring that the system can handle data-intensive tasks efficiently.
Die-to-die interfaces significantly enhance bandwidth by supporting wide data buses and high clock rates, enabling high-throughput communication. For example, Alphawave’s UCIe link can deliver up to 1.6 Tbps per direction with 24 Gbps lanes. This high bandwidth, coupled with reduced interconnect distances, also reduces power consumption—making this approach particularly advantageous for power-intensive AI workloads, whether in data centres or edge AI devices.
Initially developed for 2.5D packaging to enhance memory capacity, HBM has become a cornerstone of high-performance computing. By integrating advanced packaging technologies like 2.5D and 3D stacking, custom HBM provides a powerful solution to the memory bottleneck problem, facilitating ultra-high memory bandwidth, reduced latency, and greater power efficiency. These factors are crucial for handling the massive data demands of modern AI applications, including deep learning and real-time inference.
While challenges such as cost and thermal management remain, the performance advantages make custom HBM an invaluable asset for next-generation AI hardware systems. Alphawave Semi is well-positioned to provide complex, high-performance solutions through its leading HBM4 technology, complemented by its industry-leading connectivity IP and in-house custom silicon expertise.
Custom HBM implementations are set to play a pivotal role in the future of AI and high-performance computing, offering significant improvements in bandwidth, latency, and power efficiency that are essential for tackling the growing demands of modern workloads.
Alphawave IP Group plc (LON:AWE) is a semiconductor IP company focused on providing DSP based, multi-standard connectivity Silicon IP solutions targeting both data processing in the Datacenter and data generation by IoT end devices.