Microsoft recently addressed a significant security oversight that resulted in the exposure of 38 terabytes of sensitive information. The breach occurred on the company’s AI GitHub repository, where a bucket of open-source training data was inadvertently made public. Additionally, it included a disk backup of two former employees’ workstations, which contained confidential data such as secrets, keys, passwords, and internal Teams messages.
The repository in question, named “robust-models-transfer,” has since been taken down. Prior to its removal, it contained source code and machine learning models related to a 2020 research paper titled “Do Adversarially Robust ImageNet Models Transfer Better?”
The exposure was caused by an overly permissive SAS token, an Azure feature that allows users to share data in a difficult-to-track and difficult-to-revoke manner. The issue was reported to Microsoft on June 22, 2023.

The README.md file of the repository instructed developers to download models from an Azure Storage URL, which accidentally granted access to the entire storage account and exposed additional private data.
Microsoft’s investigation found no evidence of unauthorized exposure of customer data or any risk to other internal services. The company revoked the SAS token, blocked external access to the storage account, and resolved the issue within two days of responsible disclosure.
To prevent similar incidents in the future, Microsoft has expanded its secret scanning service to include SAS tokens with overly permissive expirations or privileges. The company also identified a bug in its scanning system that flagged the specific SAS URL in the repository as a false positive.
Misconfigured Azure storage accounts have previously been a concern, as they can provide threat actors with access to enterprise on-premise environments.

This incident adds to the list of security mishaps at Microsoft, occurring shortly after the company disclosed a breach in which hackers from China compromised an engineer’s corporate account and stole a highly sensitive signing key.
Wiz, a leading cybersecurity firm, emphasized the need for heightened security measures when handling large amounts of data for AI solutions. They recommend avoiding the use of Account SAS tokens for external sharing, as token creation mistakes can easily lead to the exposure of sensitive data.