Tech Business

Microsoft Unveils Maia: Self-Developed AI Chips Powering Large Model Training

Published on Nov 16, 2023
Image Credit: Microsoft Maia

On Wednesday, Microsoft released its first self-developed artificial intelligence (AI) chip, representing a significant milestone in reducing its reliance on Nvidia's expensive chips and advancing its capabilities in the AI field. The newly unveiled chip is specifically designed for training large language models, providing Microsoft with greater autonomy in this critical area. Additionally, the company has been actively involved in developing Arm-based CPUs for its cloud infrastructure. These self-developed chips, namely Azure Maia AI chip and Arm-based Azure Cobalt CPU, are intended to empower Azure data centers and assist Microsoft and its enterprise customers in preparing for the AI era.

The availability of Azure Maia AI chip and Azure Cobalt CPU is scheduled for 2024. This year has witnessed a surge in demand for Nvidia's H100 GPUs, which are extensively utilized for training and running generative imaging tools and large language models. Such high demand has even driven the prices of these GPUs to over $40,000 on eBay.

In an interview, Rani Borkar, director of Microsoft Azure hardware systems and infrastructure, explained Microsoft's extensive experience in chip development. Borkar highlighted that Microsoft has been involved in chip co-development for over 20 years, including collaborations on Xbox chips and Surface devices. Borkar emphasized that these endeavors have laid a foundation for their current chip development efforts. Since 2017, Microsoft has been on a journey to build a cloud hardware stack, ultimately leading them to the development of these new self-engineered chips.

The Azure Maia AI chips and Azure Cobalt CPUs are both internally designed by Microsoft and involve a comprehensive revamp of their entire cloud server stack to optimize performance, power consumption, and cost. Borkar stated that they are reevaluating cloud infrastructure in the AI era and meticulously optimizing every layer of that infrastructure.

Image Credit: Microsoft Cobalt 100

Azure Cobalt CPU, named after the element "cobalt", renowned for its cobalt blue pigment, is a 128-core chip based on Arm's Neoverse CSS design, specifically tailored for Microsoft's requirements. It is intended to offer support for common cloud services on Azure. Borkar highlighted the significance of power management considerations in the chip's design, stating that they have made conscientious design choices, including controlling performance on a per-core and per-VM basis, to ensure high performance while prioritizing power consumption capabilities.

Microsoft is currently conducting tests on Cobalt CPUs for workloads such as Teams and SQL Server, with plans to make virtual machines available to customers for various workloads next year. Although Borkar did not directly compare it to Amazon's Graviton 3 servers on AWS, it is expected that there will be noticeable performance improvements compared to Microsoft's existing Arm-based servers for Azure. Borkar mentioned that their initial testing indicates a 40% performance enhancement over current data centers that use commodity Arm servers. However, Microsoft has not yet disclosed detailed system specifications or benchmarks.

Microsoft's Maia 100 AI accelerator, named after "Pleiades", a bright blue star known as Maia in Greek mythology, is specifically designed for running cloud artificial intelligence workloads, including large language model training and inference. This accelerator will power some of the company's most significant artificial intelligence workloads on Azure, including their partnership with OpenAI, which amounts to over $10 billion. Microsoft is committed to supporting all OpenAI workloads and has collaborated closely with them in the design and testing of Maia.

Sam Altman, CEO of OpenAI, expressed his excitement about Microsoft's Maia chip design. Altman stated that they were thrilled when Microsoft initially shared their Maia chip design with them. They have worked together to enhance and test it using their models in Azure. Altman mentioned that their end-to-end AI architecture is now optimized with Maia, allowing them to train more capable models and make them more affordable for their customers.

Maia is manufactured using TSMC's 5nm process and incorporates 105 billion transistors, which is approximately 30% fewer than AMD's MI300X AI GPU (153 billion transistors). Borkar explained that Maia supports their first 8-bit data type, the MX data type, enabling a hardware and software co-design approach that facilitates faster model training and inference times.

Microsoft has joined a group of companies, including AMD, Arm, Intel, Meta, Nvidia, and Qualcomm, to collaborate on developing standards for next-generation data formats for artificial intelligence models. Furthermore, Microsoft is leveraging the collaborative and open work of the Open Compute Project (OCP) to tailor the entire system to the requirements of artificial intelligence.

Borkar revealed that Maia is the first complete liquid-cooled server processor developed by Microsoft, and their goal is to achieve higher server density with greater efficiency. He mentioned that they are reimagining the entire stack and deliberately considering every layer so that these systems can fit into their current data centers as well.

This is crucial for Microsoft as it enables them to deploy these AI servers more quickly without the need to rearrange data center spaces worldwide. Microsoft has designed a unique rack to accommodate Maia server motherboards, complete with a liquid cooler referred to as the "Assistant." It functions similarly to a heat sink in a car or a high-end gaming PC, effectively cooling the surface of the Maia chip.

In addition to sharing MX data types, Microsoft is also sharing its chassis designs with partners so that they can utilize them in systems featuring other chips. However, the design of the Maia chip itself will not be shared more widely, as Microsoft intends to keep it internal.

Currently, Maia 100 is undergoing testing with GPT 3.5 Turbo, and the model supports tools such as ChatGPT, Bing AI, and GitHub Copilot. Microsoft is in the early stages of deployment, and similar to Cobalt, the company has not disclosed precise Maia specifications or performance benchmarks.

This lack of information makes it challenging to determine exactly how Maia will compare to Nvidia's popular H100 GPU, the recently released H200, or even AMD's latest MI300X. Borkar also refrained from making comparisons but emphasized the importance of collaboration with Nvidia and AMD for the future of Azure's artificial intelligence cloud.

Diversifying the supply chain is significant for Microsoft, particularly considering that Nvidia is currently the primary supplier of artificial intelligence server chips, leading to fierce competition among companies to acquire these chips. It is estimated that OpenAI needs over 30,000 of Nvidia's older A100 GPUs to promote the commercialization of ChatGPT, and Microsoft's self-developed chips can help reduce the cost of artificial intelligence for its customers. Microsoft has developed these chips for its own Azure cloud workloads and has no plans to sell them to other companies such as Nvidia, AMD, Intel, or Qualcomm.

Borkar explained that he viewed it more as complementing rather than competing with them. He mentioned that they currently utilized both Intel and AMD chips in cloud computing. Similarly, in artificial intelligence, they could also use both AMD and Nvidia chips. Borkar emphasized the importance of these partners to their infrastructure and their commitment to offering choices to their customers.

The designations Maia 100 and Cobalt 100 indicate that Microsoft is already working on second-generation versions of these chips. Borkar stated that the series was not going to end with just one generation, but he mentioned they would not disclose their roadmap. It remains unclear how frequently Microsoft will release iterations of Maia and Cobalt, but given the rapid development of artificial intelligence, it wouldn't be surprising if the successor to Maia 100 is launched at a similar pace to Nvidia's H200 (approximately every 20 months).

The key now is how quickly Microsoft can deploy Maia, accelerating its AI ambitions, and how these chips will impact the pricing of AI cloud services. Microsoft has not yet discussed pricing for the new server, but they have quietly introduced Copilot for Microsoft 365 at a cost of $30 per user per month.

Currently, the Microsoft 365 version of Copilot is available to the company's largest customers, and enterprise users need to commit to at least 300 users to utilize this new artificial intelligence Office assistant. With Microsoft introducing more Copilot features and rebranding Bing Chat, Maia may soon play a role in powering these new experiences that require AI chips.

Tags

Comments