Microsoft Azure authentication services stumble again after last week's 14hr outage
- 28 November, 2018 07:16
Microsoft engineers today were busy trying to contain another disruption to customers who use Azure Active Directory multi-factor authentication (MFA).
Another day of frustrations for customers using Microsoft’s Azure authentication services is causing jitters among customers and partners about rolling out Azure-based multi-factor authentication.
The latest troubles come just a day after Microsoft revealed why some customers couldn’t sign-in to Azure and Office 365 for 14 hours last Monday.
Identical to last week’s outage, customers that require employees use MFA to sign in to Microsoft’s online services are being locked out of their accounts because of a failure in Microsoft’s Azure authentication infrastructure.
“Starting at 14:25 UTC on 27 Nov 2018, customers using Multi-Factor Authentication (MFA) may experience intermittent issues signing into Azure resources, such as Azure Active Directory, when MFA is required by policy. Impacted customers may encounter timeout errors,” Microsoft said in an update on its Azure status page.
The Office 365 status page today indicated it was “restarting backend services responsible for processing Multi-Factor Authentication.”
Microsoft pinned the new issue on a Domain Name System (DNS) bug that “caused the sign-in requests to fail, and resulted in impact to the infrastructure responsible for processing MFA.”
The company yesterday detailed three root causes behind last week’s lengthy authentication outage and admitted to serious blunders in its response, which included blindspots in monitoring, slow updates on its status pages, and botched mitigations that “propagated” what was an isolated EU and APAC problem to the US.
The issues were set off by mid-November updates to frontend Azure AD authentication servers that caused them to fail under heavy, but not extraordinary, traffic conditions, which in turn caused backend authentication servers to crumble under a backlog of processing requests.
The trio of issues resulted in timeouts for end-users in Europe, Asia Pacific and the Americas, affecting government customers in the US and UK.
Microsoft had not identified DNS as a source of the original outage.
Microsoft’s most recent update via its public Microsoft 365 Status Twitter account told customers that it was investigating the issue. Users have since posted screenshots of messages Microsoft sent via a private message board detailing that it had mitigated the DNS issue and was restarting its authentication systems to effect its remediation.
Microsoft said it’s watching performance data and trends on affected systems to avoid a repeat. It also promised to publish an explanation for why the incident happened within five business days.
At the time of publishing, Microsoft had removed the alert about the issue from its Azure status page.
However, Microsoft's Azure Support account on Twitter posted on Tuesday evening that “engineers have confirmed that the issue impacting Azure MFA is now mitigated.”
Microsoft said it will make a full root cause analysis available within 72 hours.