Why Meta selected Azure as a Strategic Cloud Provider to accelerate AI Research and Development?

As part of expansion of ongoing collaboration with Meta, Microsoft announced that Meta has selected Azure as a strategic cloud provider to help accelerate AI research and development and also scale PyTorch adoption on Azure.

“We are excited to deepen our collaboration with Azure to advance Meta’s AI research, innovation, and open-source efforts in a way that benefits more developers around the world,” Jerome Pesenti, Vice President of AI, Meta.

What Microsoft has been doing to support the AI needs of all customers, person and organization around the world?
  1. Microsoft has made significant advancements in Azure infrastructure, Azure Cognitive Services, and Azure Machine Learning making better support of AI needs of all customers at scale, like scaling Azure’s super computing power to train large AI models for the world’s leading research organizations
  2. Microsoft also work closely with some of the leading research organizations around the world to empower them to build great AI.
  3. Microsoft is expanding tools and resources for open source collaboration and experimentation
Why Meta selects Microsoft Azure as a strategic cloud provider?
  1. Unmatched Azure’s super computing power: Meta AI group will use Azure’s super computing power to accelerate AI research and development like training a large AI models by utilizing a dedicated Azure cluster of 5400 GPUs using latest VM featuring NVIDIA A100 Tensor Core 80GB GPUs) for some of their large-scale AI research workloads.
  2. Meta already experienced Azure’s impressive performance and scale:Meta has been using Azure Virtual Machines (NVIDIA A100 80GB GPUs) for some of its large-scale AI research after experiencing Azure’s impressive performance and scale.
  3. Higher GPU-to-GPU bandwidth between VM than other public cloud offering: With four times the GPU-to-GPU bandwidth between virtual machines compared to other public cloud offerings, the Azure platform enables faster distributed AI training”. Meta used this, for example, to train their recent OPT-175B language model.
  4. Flexible cluster configuration & ability to pause and resume: “The NDm A100 v4 VM series on Azure also gives customers the flexibility to configure clusters of any size automatically and dynamically from a few GPUs to thousands, and the ability to pause and resume during experimentation.”
  5. For PyTorch collaboration: Meta and Microsoft will collaborate to scale PyTorch adoption on Azure and accelerate development where Azure provide best-in-class hardware (NDv4s and Infiniband) for PyTorch users.

With Azure’s compute power and 1.6 TB/s of interconnect bandwidth per VM we are able to accelerate our ever-growing training demands to better accommodate larger and more innovative AI models. Additionally, we’re happy to work with Microsoft in extending our experience to their customers using PyTorch in their journey from research to production.” Meta

How Meta is getting benefited?

By using the Azure’s super computing power & flexibility in cluster configuration, Now, Meta AI team is bringing more cutting-edge machine learning training workloads to Azure to help further advance their leading AI research.

Thanks for reading this article, Please subscribe this blog to get latest on technology update.

Leave a Reply