The partnership between Microsoft and OpenAI dates back to 2019, when Microsoft invested $1 billion in OpenAI. He then bet another $10 billion in January.

There is something we must not forget here. ChatGPT runs on Azure hardware in Microsoft’s data centers. According to Bloomberg Although specific amounts were not disclosed, Microsoft has already spent “hundreds of millions of dollars” on hardware used to train ChatGPT.
Recently, Microsoft through the blog We have detailed the AI infrastructure to provide ChatGPT on Bing. For reference, Microsoft unveiled the latest virtual machine based on “ND H100 v5” hardware after the AI virtual machine based on NVIDIA A100 GPU “ND A100 v4”.
“The new VMs will feature NVIDIA H100 Tensor Core GPUs interconnected via next-gen NVSwitch and NVLink 4.0), NVIDIA 400Gb/s Quantum-2 CX7 InfiniBand, Gen 4 networking,” said Matt Vegas, senior product manager , HPC+AI, Azure It uses Intel Xeon Scalable (“Sapphire Rapids”) processors, 5th generation PCIe interconnect and DDR5 memory. Vegas also noted that Microsoft provides supercomputing performance based on its experience delivering multiple ExaOP supercomputers to Azure customers around the world.
Also from another blog Microsoft also explained how it works with OpenAI to build the supercomputers needed for ChatGPT’s (and Microsoft’s Bing Chat) large-scale language model. “We had to connect thousands of GPUs in a new way that even Nvidia hadn’t thought of,” said Needy Chapelle, product manager for Azure High Performance Computing and AI at Microsoft.
“It’s not just about buying multiple GPUs and hooking them up. A number of system-level optimizations across multiple generations are required to achieve peak performance,” Chappel added.
According to Chappel, to train a large language model, the workload is distributed across thousands of GPUs in a cluster. Additionally, the InfiniBand network pushes data at high speeds because at certain stages of the process, the GPUs exchange information about the work they have done, and the validation stage must be completed before the GPU can begin processing. next processing step.
According to the company, the Azure infrastructure is optimized for training large-scale language models, but it took years of incremental improvements to its AI platform to achieve this. The combination of GPUs, networking hardware, and virtualization software required to deliver Bing AI is large and spread across 60 Azure regions around the world.
Meanwhile, ND H100 v5 instances are currently available for preview. An exact official release date has not been mentioned.
editor@itworld.co.kr


