When OSS Growth Meets Gravity

Registries, Models, and the New Software Infrastructure Burden

Open Source Scale Has Become a Structural Risk

Open source has entered an era where scale itself has become a structural risk. Package registries that once measured growth in millions of downloads now routinely serve trillions of requests. But this growth does not map cleanly to innovation. 2025 saw 9.8 trillion downloads across Maven Central, PyPI, npm and NuGet, but the majority of registry traffic today is not driven by new applications or meaningful reuse. It’s driven by transitive dependency sprawl, unused or abandoned packages, and unsustainable tooling patterns. 

00
Trillion
downloads in 2025 across Maven Central, PyPI, npm, and NuGet

FIGURE 1.1: Yearly Downloads Over Time (Maven Central, PyPI, npm, and NuGet)

Source: Sonatype

Modern CI/CD systems and ML pipelines are optimized for speed and convenience, not efficiency. Once configured, they pull relentlessly, often blind to redundancy, cost, or downstream impact. The result is a structural burden on software infrastructure that registries were never designed to carry alone. Public software ecosystems are drifting toward a tragedy of the commons: a fraction of organizations and automated systems consume a disproportionate share of bandwidth and compute while registry operators and volunteer maintainers absorb the strain.

As software supply chains expand to include not just code, but models, datasets, and increasingly large artifacts, the question is no longer whether open software ecosystems can scale — but who pays for that scale, and how long the current system can hold.

Figure 1.2: 2025 Registry Growth

Ecosystem

'25 Total Components Added

Cumulative Total Components

'25 Total Releases Added

Cumulative Total Releases

2025 Downloads

YoY Download Growth Rate

Maven Central (Java)
260.5k
808.6k
3.3M
24.95M
839.05B
19.42%
PyPI (Python)
214.8k
821.3k
1.54M
8.85M
804.97B
50.64%
npm (JavaScript)
749.7k
5.59M
11.18M
65.56M
7.97T
65.43%
NuGet (.NET)
144.8k
760.1k
2.4M
14.02M
223.37B
17%

Ecosystem

Maven Central (Java)
260.5k
PyPI (Python)
214.8k
npm (JavaScript)
749.7k
NuGet (.NET)
144.8k

'25 Total Components Added

Maven Central (Java)
808.6k
PyPI (Python)
821.3k
npm (JavaScript)
5.59M
NuGet (.NET)
760.1k

Cumulative Total Components

Maven Central (Java)
3.3M
PyPI (Python)
1.54M
npm (JavaScript)
11.18M
NuGet (.NET)
2.4M

'25 Total Releases Added

Maven Central (Java)
24.95M
PyPI (Python)
8.85M
npm (JavaScript)
65.56M
NuGet (.NET)
14.02M

Cumulative Total Releases

Maven Central (Java)
839.05B
PyPI (Python)
804.97B
npm (JavaScript)
7.97T
NuGet (.NET)
223.37B

2025 Downloads

Maven Central (Java)
19.42%
PyPI (Python)
50.64%
npm (JavaScript)
65.43%
NuGet (.NET)
17%

YoY Download Growth Rate

Maven Central (Java)
PyPI (Python)
npm (JavaScript)
NuGet (.NET)

Registry Consumption

Maven Central

Maven Central underpins enterprise Java development, and its scale means small shifts propagate widely. In 2025, downloads grew 19.42% year over year, reinforcing Maven Central’s role as a default dependency source across commercial and open source software. Maven’s growth slowed slightly in 2025 due to sustainability measures put in place designed to limit the usage at the highest end. At this volume, incremental growth still produces large absolute increases in consumption: new releases, regressions, and vulnerabilities can affect thousands of organizations quickly.

That impact is driven as much by release velocity as by new library creation. In 2025, more than 3.3 million releases were added, creating sustained upgrade and governance pressure for consuming teams. The operational challenge is less “what exists” and more how to evaluate and manage constant version change across dependencies already embedded in production portfolios.

Security data reinforces the need to prioritize vulnerabilities in dependencies and to steer toward the safest, fastest upgrades, not toward unused or test-only components. In 2025, 40% of vulnerable Maven Central releases carried CVSS 9.0+ scores, showing that severe issues are not rare. Teams can’t control when vulnerabilities are introduced. But, at Maven Central’s scale, success hinges on prioritization and speed, not additional alerts or manual reviews.

00
%
YoY download growth
00
releases added in 2025
00
%
of vulnerable releases were CVSS 9.0+ (Critical)

FIGURE 1.3: Maven Central Release Additions Over Time

Source: Sonatype

00
%
YoY download growth
00
IN 5
2025 Releases were Tied to CVSS 7.0+
~
00
%
of the total catalog made up by 2025 component additions

PyPI

PyPI’s growth underscores where developer adoption and dependency sprawl are accelerating most quickly. With 50.64% year-over-year download growth, PyPI reflects the surge of modern workloads tied to AI and cloud development. That velocity brings scale benefits, but it also shows early signs of stress. 

In 2025 alone, new component additions accounted for 26% of the total registry catalog, a striking indicator of how quickly the universe of available dependencies is expanding. Each new package increases choice and innovation, but it also multiplies evaluation and enforcement challenges. More components mean more potential entry points for risk and greater transitive exposure as teams pull in deep dependency trees they may not fully understand or monitor.

This level of growth and breadth comes with a clear security signal: risk is not an edge case. In 2025, one in five PyPI releases was associated with a CVSS 7.0+ vulnerability, showing that serious issues regularly flow through everyday pipelines. For organizations relying on PyPI, this makes proactive controls essential. Policy enforcement, automated upgrades, and effective prioritization are no longer optional safeguards, but necessary practices to keep pace with the software ecosystem’s scale and speed.

00
%
YoY download growth
00
IN 5
2025 Releases were Tied to CVSS 7.0+
~
00
%
of the total catalog made up by 2025 component additions

FIGURE 1.4: PyPI Release by Severity Over Time

Source: Sonatype

npm

In 2025, npm downloads grew 65.43% year over year, and the software ecosystem produced over 60% of all new releases across major registries. This combination of rising consumption plus dominant release volume means npm’s impact is less about catalog size and more about release velocity: constant updates, republishing, and forks increase the rate of dependency change that consuming teams must evaluate and absorb. At this pace, traditional manual review and approval models do not scale.

In 2025 alone, npm recorded 838,778 releases associated with CVSS 9.0+ vulnerabilities, a number that reframes “rare” events into everyday realities. This scale is what enabled watershed incidents like React2Shell, discussed in the Three Layers of Failure in Modern Vulnerability Management chapter, and Shai-Hulud to have ecosystem-wide impact. As detailed in the next chapter, Malware at the Gate, npm faced a number of self-replicating malware campaigns, which ultimately added 171,740 malicious packages to the registry over the span of a few months. 

The takeaway is not blame, but scale awareness: when hundreds of thousands of ‘Critical’ releases exist in a single year, teams cannot rely on manual review or reactive patching. Automation, prioritization, and rapid upgrade motion are essential to keeping pace with an ecosystem where critical risk can now propagate as quickly as the code itself.

00
%
YoY download growth
00
CVSS 9.0+ releases in 2025
>
00
%
of all new releases (across these 4) were npm in 2025

FIGURE 1.5: Rate of Vulnerable npm Releases Over Time

Source: Sonatype

00
releases per new component in 2025
~
00
%
of 2025 vulnerable releases were below CVSS 5
00
%
of vulnerable releases in 2025 were below CVSS 9.0+ (Critical)

NuGet

NuGet may not generate the same headline-grabbing download spikes, but its cadence is distinctive. In 2025, NuGet averaged 16.5 releases per new component, pointing to rapid iteration and steady maintenance rather than pure catalog expansion. This level of churn signals active maintenance, frequent fixes, and continuous refinement, especially common in enterprise and platform-oriented .NET development. For consumers, the operational burden isn’t discovering “new,” it’s tracking version changes across dependencies already in production.

The nature of risk within NuGet further raises the stakes of that churn: in 2025, less than 1% of vulnerable NuGet releases fell below CVSS 5, indicating the vast majority of flaws are not noise. At the extreme end, 38.5% of vulnerable NuGet releases were associated with CVSS 9.0+ vulnerabilities. Paired with rapid version turnover, ad hoc patching and manual decision-making quickly break down. What NuGet demands instead are fast, reliable remediation mechanics: clear prioritization and automated upgrade workflows. 

We are no longer just measuring growth; we are evaluating its impact. As consumption across this unified software supply chain accelerates, it forces a critical question. How much of this massive consumption is productive, driving genuine innovation and business value? And, more importantly, how much is unproductive waste that the software ecosystem can no longer afford to ignore?

00
releases per new component in 2025
~
00
%
of 2025 vulnerable releases were below CVSS 5
00
%
of vulnerable releases in 2025 were below CVSS 9.0+ (Critical)

FIGURE 1.6: 2025 Vulnerable NuGet Releases

Source: Sonatype

Policy enforcement, automated upgrades, and effective prioritization are no longer optional safeguards, but necessary practices to keep pace with the software ecosystem's scale and speed.

Real Innovation vs. Synthetic Volume

As software supply chains scale, the impacts of organic growth compared to synthetic growth are increasingly distinct. Understanding this difference helps organizations focus on what truly advances their capabilities and avoid unintentionally contributing to systemic strain.

Organic Growth

Organic growth reflects real shifts in how software is built: AI adoption, cloud migration, and proliferating languages/frameworks increase dependency usage because teams are adding capabilities and moving faster. It raises complexity, but the added dependencies generally map to delivered functionality and business outcomes.

Synthetic Growth

Synthetic growth inflates volume without comparable value. Spam publishing, incentive gaming, malware, and typosquatting can spike project and download metrics, while CI/CD misconfigurations (cold caches, always-clean builds, non-expiring mirrors) repeatedly re-download the same artifacts. The result is higher bandwidth and infrastructure cost — and more risk — without improving software quality.

For organizations trying to manage risk and cost at scale, the distinction matters. Synthetic volume obscures real signals, overwhelms governance processes, and amplifies exposure without delivering benefits. It also shifts burden onto public software ecosystems that were not designed to absorb limitless, redundant traffic.

The Commons is Cracking

Public registries are global distribution systems with real costs: bandwidth and CDN delivery on every download; storage and replication for every release; and ongoing investment in abuse response, malware scanning, moderation, incident handling, and security investigations. As open source expands beyond apps into software infrastructure, AI platforms, and model hubs, these operational demands keep rising.

The sustainability problem isn’t “too much open source” but rather consumption at machine scale. Automation multiplies load: CI pipelines repeatedly pulling the same dependencies, build systems re-resolving dependency graphs, and large organizations running thousands of parallel jobs. Similar patterns are emerging in AI and model hubs, where large artifacts are repeatedly fetched by automated workflows. Defaults built for convenience can turn routine activity into sustained, high-volume demand.

The sustainability problem isn't "too much open source," but rather consumption at machine scale.


And the demand isn’t evenly spread. A small number of consumers, tools, and patterns drive a disproportionate share of traffic, compounding costs, reliability strain, and exposure to abuse. When registries slow down, pause services, or absorb malicious floods, the impact ripples across entire ecosystems — from application development to critical software infrastructure and downstream AI platforms that assume constant availability.

This isn’t a story about one bad actor or one registry failing. It’s an ecosystem-level mismatch between yesterday’s defaults and today’s machine-speed reality. Preserving the commons means updating consumption norms and shared responsibility. Ecosystem health now depends as much on how software is consumed as on how it’s created.

The Impact of Cloud Providers: Where the Load Concentrates 

Cloud provider traffic now defines what “normal” looks like on Maven Central. In the latest snapshot, the top three cloud service providers (CSPs) accounted for more than 108 billion requests, while every other user combined represents around 17 billion. Taken another way, CSPs represent just 32.5% of IPs on Maven Central, yet account for more than 86% of downloads.

At that volume, small changes in cloud build behavior (ephemeral runners, cache churn, region replication, image rebuild loops, cold-start fleets) can translate into outsized swings in total registry load.

Figure 1.7: CSPs vs. All Users: Breakdown of Maven Central Downloads

Source: Sonatype

When a small set of CSPS becomes the dominant access path to the ecosystem, Maven Central effectively serves as shared production infrastructure for cloud-native build, deploy and runtime workflows.

The implication for the commons is straightforward: registry strain is increasingly driven by automation at hyperscale, not broad-based organic growth. Improving cache persistence, tightening redundant fetch patterns, and designing “download once, reuse everywhere” behaviors inside cloud delivery pipelines becomes one of the highest-leverage ways to reduce systemic load — because the biggest consumers aren’t “more developers,” they’re a few platforms operating at machine speed.

Redownload Offenders: The Biggest Avoidable Bill on Software Infrastructure

Redownloads are where open source sustainability becomes concrete, because they represent repeat fetches that add load without adding new value. In the last seven days, the heaviest redownload activity is tightly concentrated: a large share of the top redownloaders operate behind just one or a handful of IPs, pointing to centralized CI runners, shared egress gateways, or build fleets behaving like cold-start machines.

00
%
of the top 200 re-downloading organizations contained a single IP
00
%
operate from more than 5 IPs
00
%
exceed 1,000 re-downloads in one week
00
%
exceed 5,000 in one week

The sustainability implication is that avoidable strain on Maven Central is not evenly distributed across the ecosystem. It’s driven by a relatively small set of automation patterns that scale — often inside a single organization — into repeated pulls of the same dependencies. That makes the problem unusually tractable: improvements like durable caching, correctly configured proxies/mirrors, and less “always-clean” dependency resolution can reduce outsized load quickly. Fixing one pipeline can remove pressure that would otherwise be multiplied across thousands of builds.

Overall, the story isn’t “more developers are downloading more.” It’s that modern software delivery is optimized for speed and rebuildability. When cache persistence breaks down, the cost is externalized onto shared infrastructure. The path to sustainability is aligning build defaults with commons realities so the ecosystem can keep moving fast without turning every rebuild into unnecessary traffic.

HOW TO REDUCE REDUNDANT TRAFFIC WITHOUT SLOWING DELIVERY:

  • Routing CI through repository managers or caching proxies
  • Making build and dependency caches durable across runs CISA
  • Pinning and reusing dependencies where appropriate

Why Builds Amplify Load: The Impact of Tools Like Maven and Gradle

Traffic patterns in large registries are not evenly distributed across countless clients — they are highly concentrated. In the case of Maven Central, just two build tools, Maven and Gradle, account for 81.1% of all traffic. This concentration creates outsized implications: small improvements in default behavior, caching strategies, or CI integration for these tools can materially reduce ecosystem-wide load without requiring millions of individual developers to change how they work. When the majority of consumption flows through a narrow set of tools, system-level optimizations become far more effective than relying on per-project best practices alone.

FIGURE 1.9

COMPARING BUILD TOOLS ON MAVEN CENTRAL

Criteria

Maven

Gradle

Default "Download Behavior"
More cache-trusting for pinned versions leads to fewer repeat fetches
More cache-correct + frequent revalidation leads to more repeat GETs
Where It Runs (Typical)
Benefits from long-lived machines or /build nodes with warm local repos
Common in ephemeral CI/containers with cold caches each run
Why This Matters at Scale
Naturally dampens redundant traffic over time
Can amplify redundant traffic unless caching/CI reuse is strong
Best Mitigation Lever
Persist local/CI caches (“download once, use many times”)
Durable build cache + CI artifact reuse to cut re-downloads

Criteria

Default "Download Behavior"
More cache-trusting for pinned versions leads to fewer repeat fetches
Where It Runs (Typical)
Benefits from long-lived machines or /build nodes with warm local repos
Why This Matters at Scale
Naturally dampens redundant traffic over time
Best Mitigation Lever
Persist local/CI caches (“download once, use many times”)

Maven

Default "Download Behavior"
More cache-correct + frequent revalidation leads to more repeat GETs
Where It Runs (Typical)
Common in ephemeral CI/containers with cold caches each run
Why This Matters at Scale
Can amplify redundant traffic unless caching/CI reuse is strong
Best Mitigation Lever
Durable build cache + CI artifact reuse to cut re-downloads

Gradle

Default "Download Behavior"
Where It Runs (Typical)
Why This Matters at Scale
Best Mitigation Lever

Maven and Gradle amplify registry load in different ways, not because they consume different artifacts, but because their configuration and caching models differ in practice. Gradle is engineered to be cache-correct and CI-friendly: it aggressively revalidates metadata, resolves dependencies in parallel, and is commonly run in short-lived agents or containers where caches start cold. Under normal circumstances, much of that extra traffic would be absorbed by a local caching proxy. 

~
00
x
more frequent Redownloads in Gradle than Maven


Inserting such a proxy consistently is nearly impossible to do at scale in Gradle because it lacks a strong, hierarchical inheritance model for repository configuration. That makes it difficult to centrally enforce a single caching endpoint without modifying every build or risking breakage. As a result, many Gradle builds effectively (and unintentionally) bypass local caches and hit upstream registries directly, amplifying repeat GETs for the same artifact URLs even when versions are pinned.

Maven, by contrast, has a simpler and more centralized settings model that makes proxying and mirroring straightforward. Combined with Maven’s more cache-trusting behavior for fixed versions and its frequent use on long-lived machines with warm local repositories, this naturally reduces repeat downloads over time. 

At scale, redundant downloads don’t just consume bandwidth — they increase load on the services that keep registries safe and reliable (indexing, scanning, abuse detection, and incident response capacity). The practical goal is simple: download once, reuse many times. Teams can cut repeat fetches while improving build speed and reliability by adopting durable caches, shared artifact proxies, and CI patterns that preserve dependencies across runs.

Do Now (Fast Wins)

Make CI caches durable (persist Gradle caches between runs)
Add a shared artifact proxy / repository manager
Stop “always-clean” defaults keep dependency caches even if outputs are cleaned

Do Next (Higher Leverage)

Standardize cache strategy across runners consistent paths/keys
Instrument and enforce track re-download rate; set guardrails
Reduce metadata churn pin versions; use lockfiles where applicable

AI Registries as the Next Stress Test on the Software Ecosystem

AI registries and model hubs are the next major stress test for shared distribution infrastructure. They inherit package-registry behaviors such as automation, repeat pulls, and reuse, but with a much heavier cost profile. Models, datasets, and checkpoints are large by default, often hundreds of megabytes to several gigabytes, come in multiple variants, and change frequently as teams iterate. This drives higher bandwidth, storage, and replication demands.

The risk is not just artifact size. If AI usage follows today’s norms such as CI redownloads, weak cross-environment caching, and hotspot automation, load will escalate quickly. Inefficiencies that are tolerable for small packages become expensive and destabilizing at model scale, threatening availability and reliability.

The takeaway is that scale amplifies these default behaviors, so sustainability must be designed in early. Durable caching, artifact reuse, provenance-aware distribution, and AI guardrails to prevent unnecessary pulls are critical now, before AI ecosystems reach package-registry levels of global dependency.

The takeaway is that scale amplifies these default behaviors, so sustainability must be designed in early.

Sustainable Software Infrastructure: What Responsible Consumption Looks Like

Growth across package ecosystems continues, along with the security and operational pressure that scaling creates. As registries grow, more responsibility shifts to consumers to reduce unnecessary load, limit exposure, and keep risk manageable. Responsible consumption is about maintaining developer velocity without increasing supply chain risk.

The biggest lever is architectural. Private repositories and intelligent caching should be the default. Letting CI pipelines pull directly from public registries on every build amplifies traffic, increases failure risk, and creates avoidable exposure during outages or tampering events. Centralizing dependency access through controlled repositories that cache, vet, and reuse artifacts across teams reduces churn, improves build determinism, and narrows the impact of upstream changes.

Architecture also needs guardrails. Organizations should set and enforce consumption policies that reflect real usage at scale, including limits on redundant downloads. SCA and repository management tools help by prioritizing used dependencies, de-duplicating artifacts across projects, and reducing noise from unused or unreachable components. The goal is focus, clearer signals, fewer alerts, and faster remediation.

Responsible consumption is also a shared software ecosystem issue. The heaviest consumers benefit most from public registry reliability, so long-term sustainability requires shared responsibility.

Checklist: Are You Supporting The Software Commons?

Do you have internal policies or guidelines in place to minimize unnecessary artifact publishing or republishing?
Do you intentionally batch or optimize releases to avoid unnecessary registry strain?
Do you have policies or guidelines around publishing internal-only items to registries?
Do your CI systems use local caching or private repositories by default?
Do you know which registries your org depends on most, and how much traffic you generate?
Do you distinguish between used and unused dependencies in your security and governance workflows?
Do you contribute (financially or in-kind) to the registries and OSS projects that are critical to your builds?
brand blue glyph download

Download the Full Report

brand blue glyph right arrow

Next Up: MALWARE AT THE GATE