We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Principal Product Manager - M365 CPU/GPU Capacity Management

Microsoft
United States, Washington, Redmond
Oct 24, 2025
OverviewM365 is the world's leading productivity cloud, empowering hundreds of millions of users with products such as Word, Excel, Teams, and Outlook. At the heart of this transformation is Copilot, our AI-powered assistant that brings the power of large language models to everyday work-helping users write, analyze, create, and collaborate more effectively than ever before.As Copilot adoption accelerates, so does the demand on our infrastructure. We are approaching a multi-billion dollar infrastructure footprint, managing millions of Central Processing Unit (CPU) and a growing Graphics Processing Unit (GPU) fleet.As Principal Product Manager - M365 CPU/GPU Capacity Management for M365 Capacity Management, you will lead the strategy and execution for scaling Copilot's CPU & GPU fleet. Your mission: deliver innovation at scale while reducing marginal Cost of Goods Sold (COGS) and accelerating time-to-value for new features, experiments, and model deployments.Are You...A strategic thinker who thrives at the intersection of infrastructure, product, business metrics, and AI innovation?Passionate about driving efficiency at hyperscale across hardware, software, and operations?A proactive, AI-focused product manager who thrives on extreme ownership and drives outcomes?Someone who operates without boundaries -navigating across teams, domains, and ambiguity to get things done?Energized by scaling AI infrastructure to deliver real-world impact?A natural collaborator who brings urgency, accountability, and clarity to complex, cross-functional efforts?If yes, then the M365 Core Platform Capacity Management team is just the place for you. We are looking for a Principal Product Manager who will lead the strategy and execution for scaling our Copilot CPU & GPU fleet-powering one of the most ambitious AI workloads in the world. You'll be at the forefront of designing value-based, COGS-aware capacity systems, driving multi-layered efficiency across hardware and software, and partnering across engineering, finance, and infrastructure to ensure we scale with precision and purpose.Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesDesign and operationalize a COGS-aware capacity demand & supply allocation framework for Copilot features, experimentation, and training workloads.Drive a multi-layered efficiency roadmap from hardware Stock Keeping Unit (SKU) to workload orchestration and fleet operations, ensuring every GPU delivers maximum value.Own the end-to-end capacity signal lifecycle, including demand forecasting, supply planning, and alignment with finance and operations.Evolve the Copilot COGS model, identify top marginal cost drivers, and feed insights into the efficiency roadmap.Partner with the Copilot Infra team to enhance control plane capabilities for faster experimentation, model selection, benchmarking, and production deployment.Act as a unifying force across the Foundation Fleet & Capacity team, Copilot Infra, Azure AI ensuring shared goals, aligned execution, and transparent communication.
Applied = 0

(web-675dddd98f-zqw5m)