Today, news broke that a security researcher managed to breach systems of over 35 tech companies in what has been described as a novel software supply chain attack.
By taking advantage of a concept known as dependency confusion or namespace confusion, security researcher and ethical hacker Alex Birsan pushed his Proof-of-Concept (PoC) counterfeit packages downstream in an automated fashion to the development environments of Microsoft, Uber, Tesla, Yelp and Shopify, among other tech firms.
This attack is of particular significance as unlike traditional typosquatting or brandjacking supply-chain attacks that Sonatype has talked about before, the targeted companies automatically received Birsan's malicious packages without them making any spelling mistakes, or any social engineering involved.
For demonstrating the seriousness of this type of software supply chain attacks, Birsan has been awarded upwards of $130,000 in bug bounties.
To date, Birsan has published over 200 packages, each with 100-1000 versions, to multiple open-source ecosystems including npm, PyPI, RubyGems, which is how the researcher had an astonishingly high success rate in ethically hacking big organizations.
When Birsan's research efforts began last year, our automated malware detection systems, part of Sonatype Intelligence, had simultaneously started flagging these packages as malware. At the time, the researcher told Sonatype that this was all part of an ongoing research work and that a coordinated disclosure was set to take place in early 2021.
Consequently, since then Sonatype Security Research team has been repeatedly adding these packages to our data under multiple vulnerability identifiers (sonatype-2020-XXXX IDs), keeping our customers protected from the get-go.
Sonatype also observed these 200+ brandjacking packages were published under the researcher's real name, who appeared to have a credible profile, and contained explicit disclaimers in multiple places that these were created for security research purposes (both within the code comments, and on the package homepages on the repositories), when we had contacted Birsan back in 2020.
Image: Birsan's packages contained explicit disclaimers that these were for research only in the source code and on npm pages for each package.
Today, Birsan has released his findings on his blog, with Sonatype releasing our analysis simultaneously.
Microsoft has also released a whitepaper and a vulnerability identifier (CVE-2021-24105) for its Azure Artifacts product, related to this issue.
What is dependency confusion?
The dependency confusion problem is an inherent design flaw in the native installation tools and DevOps workflows that pull dependencies into your software supply chain.
In this context, dependency confusion refers to the inability of your development environment to distinguish between a private, internally-created present package in your software build, and a package by the same name available in a public software repository.
Sonatype CTO Brian Fox has previously expressed that the lack of a proper namespacing requirement in open source ecosystems has the potential to cause dependency hijacking attacks.
"The fact that so much of the npm ecosystem is effectively not namespaced has actually created potential build time malware injection possibilities."
"If I know of a package in use by a company through log analysis, bug report analysis, etc. I could potentially go register the same name in the default repo with a very high semver and know that it's very likely that this would be picked up over the intended, internally developed module because there’s no namespace," said Fox in 2017.
For example, let's assume your application uses an internal, privately-created PyPI component called foobar (version 1) as a dependency. Later, should an unrelated component by the same name but higher version number foobar (version 9999) be published to the PyPI downloads public repository, the default configuration of PyPI development environments dictates that the foobar with the higher version be downloaded as a dependency.
In this case, that would mean, the attacker's counterfeit foobar package with a higher version number would silently and automatically make its way into your software build.
Whereas, traditional typosquatting and brandjacking attacks require some social engineering of the developer, such as them misspelling a package name, or mistaking a brandjacking package for a legitimate one, this attack is more sophisticated because no interaction is required on the developer's part.
Birsan began experimenting with the idea last year when he noticed some of PayPal's code exposed in GitHub and other places had manifest files which indicated the company used both public and private (internally created) npm packages.
He wondered what would happen if he squatted the private package names listed in the manifest on the npm open source registry, open to everyone.
He further expanded his research efforts to emulate similar attacks in PyPI and RubyGems repositories.
An example gem published by Birsan to RubyGems is analyzed below.
The shopify-cloud gem version 2300.4.2 is a counterfeit component, which makes a DNS request to the researcher's DNS server and exfiltrates the infected system's username, current directory and IP address.
This data is sent through DNS to maximize its chances of it bypassing corporate firewalls which tend to block all suspicious traffic over different protocols, but DNS.
All other 200+ packages published by Birsan in npm, RubyGems, and PyPI ecosystems contain identical code and perform the same actions.
In this way, as soon as a target company received Birsan's counterfeit components in this automated supply chain attack, his code would execute and "phone home," notifying the researcher of a successful outcome.
The researcher would then report his findings to the company whose IP address was making the contact, and managed to earn multiple bug bounties in the range of $30,000-40,000 each, totaling well over $130,000.
"At this point, I feel that it is important to make it clear that every single organization targeted during this research has provided permission to have its security tested, either through public bug bounty programs or through private agreements. Please do not attempt this kind of test without authorization," Birsan has warned in his blog post.
Why did this software supply chain attack have a high success rate?
The dependency confusion problem is an inherent design flaw in the default configuration of DevOps tools and what packages or dependencies are prioritized, should multiple ones exist by the same name.
While over 75% of the logged callbacks received by the researcher were from companies using npm packages, the problem isn’t unique to any particular ecosystem.
"This does not necessarily mean that Python and Ruby are less susceptible to the attack. In fact, despite only being able to identify internal Ruby gem names belonging to eight organizations during my searches, four of these companies turned out to be vulnerable to dependency confusion through RubyGems," says Birsan.
And the biggest leverage the researcher had in this attack, it triggered automatically without requiring human error as we have seen with typosquatting and brandjacking attacks.
A good tip here is to review the installation workflows of your development tools carefully, and even typosquat the names of your internally-created packages in public repositories to prevent misuse by adversaries.
Better namespacing strategies for components that clearly distinguish between different components is also a way to prevent the dependency confusion problem.
"The reverse fully-qualified domain name (FQDN) introduced into the Java package was a great choice to ensure classes don’t conflict."
"Popular build tools such as Maven and nearly all those that followed built upon this key concept, with the introduction of GroupId also using the FQDN as part of the name to ensure the coordinates were properly namespaced," continues Fox.
Very hard to spot such an attack without Sonatype Intelligence
In cases of counterfeit components seen thus far by Sonatype, the component names differ in their spellings, or are published under a different namespace, making them slightly easy to spot.
Dependency confusion supply attacks of this caliber would be virtually impossible to spot manually or using loose manifest-based matching to generate a software bill of materials (SBOM) because the counterfeit components published to the public repos had the exact same names as the private ones.
The companies targeted here would have never suspected that the same manifest files that had been living in their software builds all this time, would now suddenly start pulling entirely different components, despite the manifests being untouched by anyone.
This is where having an SBOM and a list of vulnerabilities in your builds generated through deep binary analysis, code similarity matching, combined with the power of Sonatype's automated malware detection systems, would come in as an indispensable solution, and save the day.
The 200+ packages that were gradually published as a part of this research were spotted by Sonatype's malware detection systems starting in mid-2020, and were incorporated in our security research data in a timely manner.
Additionally, Sonatype has released a script on GitHub that users of Sonatype Nexus Repository can use to check if any of their private dependencies have the same names as existing squatted packages on the public npm, RubyGems, and PyPI repos.
Had next-generation Sonatype products been integrated properly in software builds of the target companies, the counterfeit components would have been immediately flagged or blocked altogether as soon as they were pulled downstream within software supply chains.
Birsan believes there remains more to discover here, and expects future attacks leveraging dependency confusion across different ecosystems to grow.
According to Sonatype's 2020 State of the Software Supply Chain report, next-generation upstream software supply chain attacks are far more sinister because bad actors are no longer waiting for public vulnerability disclosures. Instead, they are taking the initiative to contribute code to open source projects and then - unbeknownst to the other OSS project maintainers - injecting malicious code. Those code changes then make their way into open source projects that feed the software supply chains of developers around the world.
By shifting their focus upstream (i.e., publishing malicious components in open source repositories), bad actors can infect a single component, which will then be distributed downstream using trusted software workflows and transitive dependencies.
Our 2020 report also shows that this is happening at a rapidly increased rate. In fact, there was a 430% increase in upstream software supply chain attacks over the past year. Keeping this in mind, it is virtually impossible to manually chase and keep track of such components.
Sonatype's world-class security research data, combined with our automated malware detection technology safeguards your developers, customers, and software supply chain from infections.
If you're not a Sonatype customer and want to find out if your code is vulnerable, you can use Sonatype's free Sonatype Vulnerability Scanner to find out quickly.
Visit the Sonatype Intelligence Insights page for a deep dive into other vulnerabilities like this one or subscribe to automatically receive Sonatype Intelligence Insights hot off the press.
Written by Ax Sharma
Ax is a Staff Security Researcher & Malware Analyst at Sonatype with a penchant for open source software. His works and expert analyses have frequently been featured by leading media outlets including the BBC. Ax's expertise lies in security vulnerability research, reverse engineering, and cybercrime investigations. He has a passion for educating a wide range of audiences through writing and vlogs.
Explore All Posts by Ax Sharma