Microsoft has been making lots of under-the-covers changes to how it runs its cloud services, including Azure-based Teams, to meet demand during the COVID-19 pandemic. Here are the details.
By: Mary Jo Foley | ZDnet.
Microsoft officials have been providing updates on how the company has been working to increase cloud capacity since the start of the globalpandemic. On June 16, Microsoft shared some more specifics about what it has been doing on this front, including some information on how it has been endeavoring to shore up its Azure-based Teams service as demand grew sharply starting this spring.
Officials had already talked about Microsoft’s prioritization of demand from first responders, healthcare workers, and other front line workers. They had shared details about some of the less-essential services they throttled. And they also had publicly acknowledged that supply chain challenges led to a shortage of some needed datacenter components, further contributing to issues meeting some cloud demands.
Today, officials said Microsoft datacenter employees have been working in round-the-clock shifts to install new servers (while staying at least six feet apart). Microsoft added new servers first to the hardest-hit regions and installed new hardware racks 24 hours a day.
They also said Microsoft doubled capacity on one of its own undersea cables which carry data across the Atlantic, and “negotiated with owners of another to open up additional capacity.” Engineers tripled deployed capacity on the America Europe Connect cable in two weeks, they added.
At the same time, product teams looked across all of Microsoft’s services running on Azure to free up more capacity for highly demanded services like Teams, Office, Windows Virtual Desktop, Azure Active Directory’s Application Proxy, and Xbox, officials said. And in some cases, engineers rewrote code to improve efficiencies, as they did in the case of video-stream processing, which officials said they made 10 times more efficient over a weekend-long push.
Teams was made to spread its reserved capacity across additional datacenter regions within a week, rather than the multiple-month-long process that such a strategy would entail, officials said. In addition, Microsoft’s Azure Wide Area Network team added 110 terabits of capacity in two months to the fiberoptic network that carries Microsoft data, along with 12 new edge-computing sites to connect the network to infrastructure owned by local Internet providers to help reduce network congestion.
Microsoft also moved its own internal Azure workloads to avoid demand peaks worldwide and to divert traffic from regions experiencing high demand, officials said. On the consumer side, Microsoft also moved gaming workloads out of high-demand data centers in the UK and Asia and worked to decrease bandwidth usage during peak times of the day.
Microsoft also has had to update its forecasting models that took into account the major uptick in cloud demand resulting from the pandemic. Microsoft added to its multiple predictive modeling techniques (ARIMA, Additive, Multiplicative, Logarithmic) some basic per-country caps to avoid over-forecasting. It also tuned its models to take into account inflection and growth patterns by usage per industry and geographic area, while adding in external data sources about COVID-19’s impact by country.
“Throughout the process, we erred on the side of caution and favored over-provisioning-but as the usage patterns stabilized, we also scaled back as necessary,” officials said.