Open Source Supply, Demand, and Security


Open Source Supply
Software and the ability to produce it requires quality, security and availability–cornerstones of the information age. Software developers rely on the availability of quality components, frameworks, libraries, and pre-trained AI models that are available through central repositories. With software supply chain incidents, like Log4shell, continuing to make headlines, it would be understandable to feel a sense of unease and want to completely reexamine every element within the development lifecycle from security through to general management. Heightened scrutiny is driving governments, foundations, senior practitioners, and C-levels to push for greater transparency and shared responsibility across development sourcing methodologies and orchestrated responses to secure a better path forward.
Leveraging eight years of experience, the Software Supply Chain Report team has gathered and analyzed data that informs the type of outcomes that everyone involved in the software supply chain should take note of. At the epicenter of software supply chain management are the trends associated with open source adoption. The supply of open source continues to grow at double-digit rates and shows no signs of stopping anytime soon. Similarly, the volume of open source downloads is ever-accelerating, creating a massive increase in consumption. This equates to a perfect storm of potential threats that expands in scope, complexity, and impact.
Figure 1.1. Software Supply chain statistics, 2022
Ecosystem | Total Projects | Total Project Versions | 2022 Annual Request Volume Estimate |
YoY Project Growth | YoY Download Growth | Average Versions Released per Project |
---|---|---|---|---|---|---|
Java (Maven) | 492k | 9.5M | 675B | 14% | 36% | 19 |
JavaScript (npm) | 2.06M | 29M | 9% | 32% | 14 | |
Python (PyPI) | 396K | 3.7M | 18% | 41% | 9 | |
.NET (NuGet) | 321K | 4.7M | -5% | 23% | 15 | |
Totals / Avgs | 3.3M | 47M | 3.1T | 9% | 33% | 14 |
Although the overall average growth rate of open source consumption has slowed significantly from the 2021 all-time high of 73% to a more moderate estimate of 33%, it’s important to note that the overall download volume across the four major ecosystems is now projected to top 3 trillion downloads overall. The npmjs ecosystem in particular is poised to serve nearly as many downloads in 2022 as the four ecosystems combined in 2021.
Figure 1.2. Open Source Projects and Versions Growth, 2022
Open Source Demand
The Log4j event was a watershed moment for many organizations and greatly influenced the development of new open source management policies. In the last year, many open source program office initiatives emerged with significant attention centered on procurement practices. Surveys designed to measure behavioral change in some cases point to an actual decrease in open source adoption and usage. Not surprisingly, there were notable upticks in behaviors tied to major security incidents — something similarly observed during the struts vulnerabilities of 2017 that ultimately led to the famous Equifax data breach.
Interestingly, the data on downloads collected from each of the registries is not in sync with self-reported behavior.
In the sections that follow, we explore some of the perception vs reality dynamics. Nevertheless, 2022 saw the largest volume of open source downloaded and at the time of this writing, downloads have already topped the 2021 volume.
In 2022, the number of open source dependencies being downloaded and integrated into software grew by an estimated average of 33% across all the monitored ecosystems (cited in Figure 1.2 above). Economies digitizing with unabated increases combined with innovations in AI, cloud computing, cyber security, and the phenomena of remote work, on judgment, may explain this significant increase in downloaded open source dependencies.
Figure 1.3. Estimated Annual Download Volumes, 2018–2022
Annual Download Volumes Across the Tracked Ecosystems
The overall growth rate of adoption reveals a slowing to 30-35% across all ecosystems, which is down from 2021 and previous years. The convergence across all ecosystems most likely signals the evolution of the wider open source economy, representing the addition of 758 billion software dependency downloads across the tracked ecosystems. Figure 1.3 illustrates how these download rates have trended since 2018.
Figure 1.4. Growth Rate Across Ecosystems, 2019–2022
Annual Growth Rate of Each Ecosystem
Other than large differences in download volume due to how each ecosystem works, and what is distributed as a packaged dependency (e.g., functions, frameworks, library collections), it is noteworthy to look at how the growth rate in each of the ecosystems has evolved over the past four years. For example, NuGet experienced massive growth in 2020, which has trailed off in the following years.
Figure 1.4 charts these individual growth rates over time and displays an average across all four ecosystems.
Although the rates are converging at a steady 33% YoY rate on the overall average, it’s evident that this still represents an immense growth rate given the size of the base.
Figure 1.4 charts the overall growth across all ecosystems and illustrates that although the pace of growth is slowing down, the absolute scale of growth continues to compound on the previous yearly rates. The pace of open source adoption shows no signs of running out of steam anytime soon.
Individual Ecosystem analysis
Through the first 4.5 months of 2022, 244 billion Java components were downloaded from the Maven Central Repository following 497 billion downloads in 2021 - nearly 9% above original estimates. At this rate, the volume for 2022 is projected to be over 675 billion, a 36% YoY increase. These overshoots can be explained by the relative stability of Maven Central as the focused ecosystem of the Java Virtual Machine (JVM) world, which reflects the organic growth. It also may be tied to the significant consolidation event linked to JCenter, a popular open source registry closing down, and as a result, has led to projects migrating to The Maven Central Repository.
The impact of the activity surrounding Log4shell also cannot be discounted from the final figures, inflating the download statistics in December, which were well above normal.
Java 2022 by the numbers:
Npmjs continues to be the largest open source ecosystem measured by download volume. In 2021, JavaScript developers requested 1.5 trillion packages from npmjs - beating estimates by a substantial 40%. Since JavaScript is the most popular programming language, the ecosystem’s distribution is much more networked, with functional packages designed to be distributed with great agility and fit for purpose, compared to the larger collected libraries seen in the other ecosystems.
All told, the total volume for 2022 is estimated to approximate 2.1 trillion, representing 32% annual growth. Significantly, the volumes served by npmjs tally as many packages as all four registries combined in 2021.
JavaScript 2022 by the numbers:
Python and its package registry PyPI continues to be the fastest growing software ecosystem for the second year running. In 2021, PyPI served nearly 127 billion packages. In 2022, PyPI download volume is estimated to hit 179 billion packages. YoY growth of PyPI download volume is estimated at 41%.
Python 2022 by the numbers:
NuGet is the chosen ecosystem of the .NET family of languages and continues to serve engineers working with the ever-growing Microsoft technologies. Developers downloaded 78 billion NuGet packages in 2021. In 2022, estimates peg volume at over 96 billion packages, representing 23% YoY growth.
.NET 2022 by the numbers:
Through the first 4.5 months of 2022, 244 billion Java components were downloaded from the Maven Central Repository following 497 billion downloads in 2021 - nearly 9% above original estimates. At this rate, the volume for 2022 is projected to be over 675 billion, a 36% YoY increase. These overshoots can be explained by the relative stability of Maven Central as the focused ecosystem of the Java Virtual Machine (JVM) world, which reflects the organic growth. It also may be tied to the significant consolidation event linked to JCenter, a popular open source registry closing down, and as a result, has led to projects migrating to The Maven Central Repository.
The impact of the activity surrounding Log4shell also cannot be discounted from the final figures, inflating the download statistics in December, which were well above normal.
Java 2022 by the numbers:
Npmjs continues to be the largest open source ecosystem measured by download volume. In 2021, JavaScript developers requested 1.5 trillion packages from npmjs - beating estimates by a substantial 40%. Since JavaScript is the most popular programming language, the ecosystem’s distribution is much more networked, with functional packages designed to be distributed with great agility and fit for purpose, compared to the larger collected libraries seen in the other ecosystems.
All told, the total volume for 2022 is estimated to approximate 2.1 trillion, representing 32% annual growth. Significantly, the volumes served by npmjs tally as many packages as all four registries combined in 2021.
JavaScript 2022 by the numbers:
Python and its package registry PyPI continues to be the fastest growing software ecosystem for the second year running. In 2021, PyPI served nearly 127 billion packages. In 2022, PyPI download volume is estimated to hit 179 billion packages. YoY growth of PyPI download volume is estimated at 41%.
Python 2022 by the numbers:
NuGet is the chosen ecosystem of the .NET family of languages and continues to serve engineers working with the ever-growing Microsoft technologies. Developers downloaded 78 billion NuGet packages in 2021. In 2022, estimates peg volume at over 96 billion packages, representing 23% YoY growth.
.NET 2022 by the numbers:
Open Source Security
The amount of third-party code flowing through software supply chains occurs on a massive scale. Yet published code accrues technical debt over time, creating the potential for compounded security vulnerabilities, if not kept up to date.
In 2021, the State of the Software Supply Chain Report revealed that the top 10% most popular open source projects, as measured by download volume, contain the most security vulnerabilities as currently identified. These vulnerabilities were most present in the Log4j project and Log4j-core package that contained the now-famous Log4Shell security vulnerability. Analyzing this incident provided an unprecedented view into the adoption of new fixes and a glimpse of how the industry tends to utilize fixes. Figure 1.5 illustrates the relative adoption rates of Log4j-core, with subsequent releases addressing issues outside the original issue.
As we can observe in Figure 1.5, the fix-adoption rate stabilizes around 65% and, at the time of writing, hovers between 65-70% with significant differences between different countries. A deeper look at Log4shell is captured in the Open Source Management section, but its occurrence is a significant case study for research and how the industry can identify better practices adopting newer versions–faster.
Figure 1.5. Adoption of Log4Shell releases from August 2021–August 2022
Unfortunately, Log4Shell taught us another lesson about the security vulnerabilities in dependencies–it’s not only the direct inclusion of the code that matters, but also indirect inclusion of all kinds. Dependencies may be pulled in as a part of a transitive dependency chain for a given program, as well as embedded into other software that might be used. This last characteristic is what makes the event so menacing: it is not enough to know where developers are using Log4j-core, organizations have to know all software that uses Log4j.
Digging further, the Log4Shell vulnerability is introduced by the JndiManager class that exists inside the Log4j-core artifact. Matching versions of this affected class exists inside 783 other projects with a total of 19,562 versions. Tooling that only relies on the vulnerability disclosure and doesn't look deeper will miss these existing use cases, leading to false negatives and a false sense of security.
The code that was implicated to cause Log4Shell inJndiManager.classwas borrowed by 783 other projects, being seen in over 19,562 individual components.
The networked nature of dependencies highlights the importance of having visibility and awareness about these complex supply chains. These dependencies impact our software so having an understanding of their origins is critical to vulnerability response. Many organizations did not have the needed visibility and continued their incident response procedures for Log4Shell well beyond the summer of 2022 as a result.
The code that was implicated to cause Log4Shell inJndiManager.classwas borrowed by 783 other projects, being seen in over 19,562 individual components.
Malicious Software Supply Chain Attacks Increase Another 633% YoY
It’s not just the new security vulnerabilities that companies must contend with day-to-day. The world’s software development community must also deal with the ever-increasing threat from packages that are specifically designed to be malicious and present serious security threats from the onset.
These packages come in many shapes and sizes, but what unifies them is they rarely even pretend to be working code; they exploit the automation that exists in the build or in the dependency managers used by developers, who inadvertently install the malicious code in nanoseconds. When developers encounter these failures in their builds, the research has found that their answer often is fixing a typo and trying again. Over the years, the depth and breadth of the attacks have become more sophisticated as typified in the targeting, package hijacking, brandjacking, typosquatting, and even in the fileless malware that’s being distributed as a part of these attacks.
Last year we reported that the number of these malicious, next-generation attacks had increased to 12,000 known instances. This year, the number of captured malicious packages has continued to increase significantly, and at the time of this writing sits at over 88,000 known instances. This number is derived from verified suspicious packages caught in the ecosystem monitoring that we typically do, and excludes packages later proven to be clear. This number is most likely a conservative count as it’s compiled from a single source; the actual volume is potentially much higher.
Nevertheless, the reported volume represents another 633% annual increase in known attacks, representing exponential growth when looked at with a longer-term lens. Figure 1.6 charts the growth of known instances over time.
Another way to size up these seismic increases is to chart the effect of observed instances from 2019; since then, the average growth is 742% per year. This average annual growth rate over the last three years is nothing short of astonishing and underlines the need to step-up governmental and industry-driven efforts to curb and defend against these attacks. See the later section, Establishment and Expansion of Software Supply Chain Regulation and Standards, for more insight into what the public response was to these issues.
Creating and distributing these malicious packages is unfortunately still relatively easy, though some key improvements have been made in the last 12 months to protect against certain scenarios. In February 2022, GitHub introduced mandatory two-factor authentication for the top 100 npm maintainers and PyPA is working to reduce dependence on setup.py, which is a key element to how these attacks can launch alongside while promoting 2FA adoption using a public dashboard. These measures are an important step in preventing maintainer hijacking of known popular packages and will encourage added trust in the integrity of the package maintainer. Nevertheless, though these measures are important first steps, they are only a partial solution to the wider problem of open namespaces and malicious package publication.
In order to come up with strategies to minimize the problem, it’s important to focus on the different types of strategies being disseminated by these bad actors in the software supply chain. In 2022, the team observed several recurring tactics. Below, you’ll find some of the more notable cases.
Dependency Confusion
Dependency confusion, a form of attack relying on spoofing internal package names and publishing them to an open source registry with an abnormally high version number. This continues to be one of the most numerous types of attacks observed. This intrusion reflects a highly targeted approach and is favored by both security researchers doing legitimate penetration testing, as well as adversaries seeking entry into a given organization.
The type of defense that can be applied against dependency confusion exists in both upstream and downstream scenarios. Ultimately, these attacks rely on the fact that an organization will not register its internal package names in the upstream repositories.
Some registries such as the Maven Central Repository require mandatory organizational verification and a namespaced coordinate system to help avert this type of attack. Other registries take an open door policy to package registrations, requiring measures downstream such as auditing incoming packages against designated package names. The simplest mitigation for these types of attacks is to use namespaced coordinates whenever possible and claim your organization’s namespace upstream early.

Recent Dependency Confusion Campaigns
- A targeted campaign against karapace, a Apache Kafka implementation for Python
- A targeted campaign against VMware vSphere later revealed to be a bug bounty hunter
- A user uploading over 1,200 dependency confusion packages that exfiltrated sensitive system information aimed against many organizations including Sagepay, Apple and Google to name a few
Typosquatting and its Cousin–Brandjacking
Typosquatting continues to be a popular approach for executing software supply chain attacks and relies on a deceptively simple technique. Pick a popular component, misspell the name slightly, and rely on the assumption that some developers will make a mistake in adding a component. Software development is ultimately a very repetitive form of writing. With millions of pairs of hands typing ‘npm install’, or editing requirements.txt on millions of keyboards, inevitably, mistakes will happen. There is also a variation on this theme or type of attack called “brandjacking”. In this variant, a known name is taken over or spoofed closely to bait developers into accidentally integrating these packages instead of the legitimate component they were expecting.

Recent Typosquatting Attacks
- Cryptominers distributed using typosquatting
- Requests typosquat installs malware
- PyPI packages that stole authentication keys distributed using typosquatting
- PyMafka - a typosquatted package that dropped Cobalt Strike as its payload
- rustdecimal - a Rust Crate found in crates.io named after the legitimate 'rust_decimal'
- Multiple instances of typosquats against the popular "colors" -library on npmjs
An example seen in real life is a campaign against the colors library, with adversaries naming their packages “colors-2.0” or “colors-helper” and so on.
These techniques are often combined with a malicious payload that executes immediately using the built-in functionality of the developer's build tool. Most current build utilities such as npm, cargo, pip3, etc. allow the package maintainer to execute some sort of setup script during package installation. There are many legitimate uses for such functionality, such as compiling native libraries for use or preparing a directory structure. Unfortunately, as there is usually no user interaction when executing this step, this same mechanism can be used to fetch malicious payloads from a command and control server, which is then automatically installed and executed without the user’s awareness.
The most common types of payloads include installing malware and exfiltrating system credentials and tokens, as well as other system information.
What is common with these types of packages is that they operate much like phishing scams, duplicating the README sections of their chosen targets closely. Take a closer look at the code that’s published though, and you’ll see that they rarely work like the original, and instead will cause build failures. By the time the developer notices something is amiss, it is already too late — the damage has been done by the malicious payload executed via the build automation. The developer has probably not given much thought to this occurrence and likely believes that they just encountered a harmless typo.
Mitigating these types of attacks unfortunately is not possible without some level of automated verification that looks at the newly acquired dependencies against known malware signatures or other suspicious signs. Due diligence at the keyboard will certainly help alleviate errors, but we all have a tendency to make typos so organizations should be cognizant of this reality in their approach.
Malicious Code Injections
Malicious Source Code Injections are another type of attack that poses a very real risk to developers of popular libraries but, due to the highly targeted nature of these attacks, are less numerous than the ‘mass attack’ types noted above. This form of attack however is no less destructive, leveraging a popular component as a vector for a malicious payload.
The attack design relies on an adversary gaining access to the source code of a library, either through a compromise (as seen in the codecov and SolarWinds cases), or by pretending initially to be a benevolent open source committer. Once they have access, some of the code is modified, often hidden in what may otherwise appear as harmless code changes. This payload is then distributed to users of the library and continues with its nefarious objective. At its simplest, it may just be another credential exfiltration, but it was also observed in the sophisticated crypto heist that stole cryptocurrency worth over $3 million USD. The attack used compromised access to a private GitHub repository of a crypto auction site’s front-end.

Recent examples of Malicious Code Injections
- 3 million dollar crypto heist - leveraging access to a private GitHub repository
- Coa - a very popular npm library is hijacked via an npm account takeover to distribute malware
- Rc - another popular library is hijacked moments after coa in seemingly the same campaign
Mitigating these types of campaigns requires two things:
- Awareness of what software components are integrated into software both directly and transitively (i.e., using an SBOM, or Software Bill of Materials).
- The ability to execute changes at a rapid pace as soon as the corrupted release is discovered.
An encouraging aspect of the open source community is that code can be scrutinized, so many of these campaigns are uncovered swiftly.

Protestware in the Software Supply Chain
- January 2022: Maintainer of popular 'colors' and 'faker' libraries adds code to cause a Denial of Service in applications in protest of big corporations using open source but not contributing anything back to the community.
- March 2022: 'node-ipc' project begins deleting data of users it suspects to be Russian or Belarusian. Any application using the library ends up overwriting Russian users' files with a '❤️' emoji.
- March–April 2022: In days following the incident, maintainers behind npm libraries like 'event-source-polyfill', 'es5-ext' and 'styled-components' add peaceful anti-war messages to their packages.
Protestware
Another subvariant of malicious code injections that the team has been observing more of in the last 12 months is protestware. In this scenario, an incident occurs when a maintainer deliberately sabotages their own project to cause harm or malfunction in ways that disrupt its adopters’ work.
The emergence of these types of campaigns open the door to an interesting dialogue about what constitutes reckless irresponsibility versus a seemingly harmless functionality change by a developer. While technically it is the maintainer’s right to do whatever they want with their code, as all open source code is delivered as-is. The consequent disruption caused to users, inflicting significant harm and delay, may not necessarily have been the intended outcome. The emergence of these types of self-imploding protest releases has caused healthy debate, as they are often performed in support of causes such as the illegal war in Ukraine, or due to the maintainer feeling that they are not adequately compensated for their efforts. Whether the cause is legitimate or not, it highlights the need for consumers of open source to be ready for this eventuality.
Though these types of events are not currently very numerous, due to their highly unpredictable and disruptive nature, they underscore the need to build in a response procedure inside organizations that adopt open source. Whether it’s Log4Shell or components going rogue or some other malicious attack, dependencies will require replacing at a rapid pace — something that organizations need to prepare for. We explore possible strategies for this later in the Report in the section, Open Source Dependency Management: Trends and Recommendations.
A Timeline of Attacks
Last year as a part of the Software Supply Chain Report, the team introduced an online timeline that could be viewed to show how these attacks evolve, highlighting each individual case. The case studies referenced in the above sections can be seen here. As more attacks are observed the team will continue to maintain and update the timeline.