Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
System-level innovations are key to unlocking the full potential of AI
Even as the AI era has captured the public imagination and inspired countless potential applications, it’s also exposed a critical gap: Current AI data center infrastructure is not equipped to manage the massive computational demands of today's and tomorrow’s AI applications and models. This mismatch between AI’s true capability and our ability to deploy it effectively is widening, particularly when it comes to inference—the process of putting trained AI models to work in the real business world.
Picture this: We’ve built incredible AI “spaceships” in GPUs, inference chips and all manner of AI accelerators – capable of exploring vast digital universes but we’re trying to launch them from pothole-filled runways. AI professionals often say: “Training is how you make AI; inference is how you use AI.” Using AI effectively requires more than large and powerful algorithms; it demands a robust, underlying computing and networking infrastructure. This infrastructure is the mission control center for all kinds of AI spacecraft. Likewise, AI infrastructure is truly the critical foundation that brings agentic AI, generative AI and conversational AI to life – whether single or multi-modal.
In my years working on semiconductor solutions, I’ve witnessed firsthand how an infrastructure gap stifles innovation. AI inference is the daily operation of your enterprise AI applications – and it’s a completely different technical challenge from training the AI in the first place. Yet, most companies still rely on general-purpose host CPUs for high-powered AI inference. While training large AI models is a resource-intensive process demanding vast GPU clusters, relying on the same machines to serve AI is inefficient and costly. It's akin to employing a spaceship for a short commute across town.
The challenges extend beyond just AI hardware choices. Large AI models consume significant power, straining data center resources and increasing both operational costs and energy consumption, with negative environmental consequences. While the industry is actively pursuing solutions like reduced power consumption and cleaner energy sources – including recent nuclear power announcements for data centers by Amazon, Google and Microsoft – another crucial approach is to improve the energy efficiency of the data servers themselves.
My team has made it our mission to provide the most energy-efficient, highest-performing AI inference solutions for the dollar. As it turns out, you do this not by making a faster rocket ship; but by redesigning the underlying system architecture – the mission control – without the old host CPU architecture that’s been the de facto standard for decades.
It’s not just nice to do; it’s a must. Today’s high-end AI accelerators present a daunting financial barrier, especially for the mid-market and smaller, lower-margin organizations. Integrating AI systems into existing IT infrastructure requires specialized knowledge, further complicating adoption. This combination of technical limitations, financial hurdles and implementation challenges has hindered the widespread adoption and effective use of AI technologies. In fact, less than half of global businesses and governments can afford to adopt AI into their operations and customer experiences today.
Rethinking Underlying Infrastructure: The AI-Optimized Approach
The AI industry is undergoing a transformation, with established giants like Nvidia, AMD and Oracle, alongside innovative contenders like d-Matrix, Rebellions and Cerebras. For years, my industry obsessed with building faster, more powerful AI accelerators (the engines of AI) with fewer companies working at the systems level. Now, it’s not really an option. We must collectively focus on systems architecture—the launchpad, flight control system and landing zone—that maximizes the performance of any AI accelerator, regardless of its origin or design: GPUs, TPUs, NRUs, LPUs, ASIC, FPGAs.
Look to the new AI entrants – AI semiconductor startups – doing incredibly innovative work to optimize AI infrastructure. As Matthew Kimball, a technology analyst from Moor Insights & Strategy pointed out at Supercomputing 2024 last November: “Silicon innovation is on another level. I don't think I've ever seen so much innovation at the chip level. Call me crazy but if you think NVIDIA's next big threat is AMD, Intel, Qualcomm or Arm - you may be right. Or it may be a small company that has the freedom to think and address compute challenges differently.”
Our approach involves integrating the host CPU and NIC functions into a single chip – many of them paired with each AI accelerator – to super boost the AI rocket ships to fly faster, longer and further to near 100% maximum capability. Today, those same AI accelerators tap out at under 30% to 50% utilization. We simply can’t go the distance with that level of waste and overspending. Tomorrow’s AI inference servers must process even more complex data in larger volumes from agentic and generative AI, for example, to deliver more accurate and insightful responses.
Beyond hardware, specialized companies are developing advanced cooling systems, networking, storage and software to support increasingly demanding AI workloads.
The trend toward specialization in the AI industry is also promising. Much like the space industry, the AI landscape is witnessing specialization. While Nvidia, AMD and others build powerful rockets, other companies design and build the supporting infrastructure that allows those rockets to reach their full potential — to explore deeper, travel further and uncover new frontiers.
System-level innovations are key to unlocking the full potential of AI, like how launchpads and interstellar highways enable spacecraft to explore further and achieve even greater discoveries.
Looking to 2025: The Year of AI Inference
As we move beyond the initial AI hype cycle, 2025 will see a shift to far more pragmatic, affordable and sustainable AI solutions. AI-optimized infrastructure is crucial to achieving this goal, with deep tech companies actively developing solutions to drastically improve price/performance – not just incremental improvement but a fundamental rethinking of AI infrastructure. We're not just patching potholes; we're building intelligent highways for AI, liberating data from current host CPU obstruction and allowing AI data to flow freely. As space exploration history shows, it’s often the collaborative approach that wins the day – promising to accelerate AI adoption across diverse sectors, from government and healthcare to finance and entertainment, ushering in an era of unprecedented innovation and value.
The positive environmental impact of all networking, storage, cooling and hardware improvements cannot be overstated. As AI models grow, more efficient infrastructure could allow for expanded capabilities without proportionally increasing energy use – like modern rockets achieving greater thrust with less fuel. While challenges remain, progress in AI infrastructure is promising.
2025 is crucial for driving broader market adoption of AI technologies. By tackling the inertia behind old “mission control” infrastructure head-on, the next generation of AI pioneers is paving the way for a more sustainable and impactful AI revolution. We may soon enter an era where more businesses and consumers alike can explore new digital galaxies without being held back by earthbound limitations.
In this new frontier, advanced AI’s computational demands and infrastructure are no longer a hindrance but a true catalyst for both affordable and stunning innovation.
You May Also Like