Over the past decade, GPUs have become essential for advancing artificial intelligence (AI) and machine learning (ML). Their ability to handle parallel processing has made them indispensable for tasks like model training and inference. However, as demand for AI applications increases, the limitations of GPUs are becoming apparent. A significant amount of time in AI workloads is spent on networking between chips, with data suggesting that up to 40% of time is lost in this process due to bandwidth constraints.
As AI workloads continue to grow, the high cost and power consumption of general-purpose GPUs are driving a shift towards custom silicon and chiplet-based designs. These modular systems are more flexible and scalable, allowing for optimised power efficiency and improved communication bandwidth.
While GPUs are successful in training AI models, they face challenges when it comes to scaling AI applications, particularly for real-time inference tasks. The cost and energy requirements of GPUs are proving unsustainable, especially in edge environments where data must be processed near its source. As a result, AI-specific ASICs are emerging as a more efficient and cost-effective alternative for these tasks. These specialized chips can offer lower power consumption and faster performance compared to GPUs in certain use cases.
The focus of the industry is also shifting from training models to inference, particularly for edge AI, where lightweight, specialized chips are more effective. Arm-based architectures, such as Neoverse, are gaining traction for edge AI because they deliver high performance with low energy demands. These chips are already commonly used in mobile devices and are being adapted for AI workloads, particularly where power efficiency is critical.
Scaling AI to meet mass-market needs presents significant challenges, particularly with monolithic GPUs. These chips are limited by manufacturing constraints, such as the maximum size of a wafer, which affects the number of transistors that can be placed on a single chip and restricts the number of I/O connections. This leads to connectivity bottlenecks and limits the performance of AI applications at scale.
Chiplet-based designs are seen as a solution to these problems. By breaking down the system into smaller, more specialised units, chiplets allow for greater flexibility and scalability. Each chiplet can be optimised for a specific function, whether it’s processing, memory management, or I/O operations. This modular approach reduces the cost of manufacturing large SoCs, improves power efficiency, and addresses the connectivity bottlenecks associated with traditional architectures.
By incorporating chiplets from multiple vendors, companies can create more flexible and efficient AI hardware that uses the best available components. This approach reduces latency, enhances communication bandwidth, and improves scalability, particularly for real-time AI applications.
As the AI industry shifts from training to inference, chiplet-based architectures could challenge the dominance of traditional GPUs. Just as Arm disrupted the mobile industry, chiplets offer a new way forward for AI, providing a more energy-efficient, cost-effective, and scalable solution for the future of computing. The growing use of chiplets is reshaping the landscape of AI hardware, offering significant advantages in terms of performance and power efficiency.
The rise of chiplets in AI computing suggests that the future may not be dominated by GPUs as once expected. Instead, chiplet-based systems are positioned to lead the way in meeting the demands of AI at scale.
Alphawave IP Group plc (LON:AWE) is a semiconductor IP company focused on providing DSP based, multi-standard connectivity Silicon IP solutions targeting both data processing in the Datacenter and data generation by IoT end devices.