Want to play with the technology yourself? Explore our interactive demo → Learn more about the technology → Whether you're dealing with large language models or seeking efficient ways to handle high request volumes, you need to know how to manage and optimize your AI infrastructure. Join Aaron Baughman as he explores advanced strategies for scaling generative AI algorithms across GPUs. Aaron covers batch-based and cache-based systems, agentic architectures, and model distillation techniques and explains how you can use these methods to optimize performance, reduce latency, and enhance personalization in AI applications. AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM →
Hide player controls
Hide resume playing