When OSS Growth Meets Gravity
Registries, Models, and the New Software Infrastructure Burden
Open Source Scale Has Become a Structural Risk
Open source has entered an era where scale itself has become a structural risk. Package registries that once measured growth in millions of downloads now routinely serve trillions of requests. But this growth does not map cleanly to innovation. 2025 saw 9.8 trillion downloads across Maven Central, PyPI, npm and NuGet, but the majority of registry traffic today is not driven by new applications or meaningful reuse. It’s driven by transitive dependency sprawl, unused or abandoned packages, and unsustainable tooling patterns.
FIGURE 1.1: Yearly Downloads Over Time (Maven Central, PyPI, npm, and NuGet)
Source: Sonatype
Modern CI/CD systems and ML pipelines are optimized for speed and convenience, not efficiency. Once configured, they pull relentlessly, often blind to redundancy, cost, or downstream impact. The result is a structural burden on software infrastructure that registries were never designed to carry alone. Public software ecosystems are drifting toward a tragedy of the commons: a fraction of organizations and automated systems consume a disproportionate share of bandwidth and compute while registry operators and volunteer maintainers absorb the strain.
As software supply chains expand to include not just code, but models, datasets, and increasingly large artifacts, the question is no longer whether open software ecosystems can scale — but who pays for that scale, and how long the current system can hold.
Figure 1.2: 2025 Registry Growth
|
Ecosystem |
'25 Total Components Added |
Cumulative Total Components |
'25 Total Releases Added |
Cumulative Total Releases |
2025 Downloads |
YoY Download Growth Rate |
|---|---|---|---|---|---|---|
| Maven Central (Java) |
260.5k
|
808.6k
|
3.3M
|
24.95M
|
839.05B
|
19.42%
|
| PyPI (Python) |
214.8k
|
821.3k
|
1.54M
|
8.85M
|
804.97B
|
50.64%
|
| npm (JavaScript) |
749.7k
|
5.59M
|
11.18M
|
65.56M
|
7.97T
|
65.43%
|
| NuGet (.NET) |
144.8k
|
760.1k
|
2.4M
|
14.02M
|
223.37B
|
17%
|
Ecosystem
| Maven Central (Java) |
260.5k
|
| PyPI (Python) |
214.8k
|
| npm (JavaScript) |
749.7k
|
| NuGet (.NET) |
144.8k
|
'25 Total Components Added
| Maven Central (Java) |
808.6k
|
| PyPI (Python) |
821.3k
|
| npm (JavaScript) |
5.59M
|
| NuGet (.NET) |
760.1k
|
Cumulative Total Components
| Maven Central (Java) |
3.3M
|
| PyPI (Python) |
1.54M
|
| npm (JavaScript) |
11.18M
|
| NuGet (.NET) |
2.4M
|
'25 Total Releases Added
| Maven Central (Java) |
24.95M
|
| PyPI (Python) |
8.85M
|
| npm (JavaScript) |
65.56M
|
| NuGet (.NET) |
14.02M
|
Cumulative Total Releases
| Maven Central (Java) |
839.05B
|
| PyPI (Python) |
804.97B
|
| npm (JavaScript) |
7.97T
|
| NuGet (.NET) |
223.37B
|
2025 Downloads
| Maven Central (Java) |
19.42%
|
| PyPI (Python) |
50.64%
|
| npm (JavaScript) |
65.43%
|
| NuGet (.NET) |
17%
|
YoY Download Growth Rate
| Maven Central (Java) |
|
| PyPI (Python) |
|
| npm (JavaScript) |
|
| NuGet (.NET) |
|
Registry Consumption
Maven Central
Maven Central underpins enterprise Java development, and its scale means small shifts propagate widely. In 2025, downloads grew 19.42% year over year, reinforcing Maven Central’s role as a default dependency source across commercial and open source software. Maven’s growth slowed slightly in 2025 due to sustainability measures put in place designed to limit the usage at the highest end. At this volume, incremental growth still produces large absolute increases in consumption: new releases, regressions, and vulnerabilities can affect thousands of organizations quickly.
That impact is driven as much by release velocity as by new library creation. In 2025, more than 3.3 million releases were added, creating sustained upgrade and governance pressure for consuming teams. The operational challenge is less “what exists” and more how to evaluate and manage constant version change across dependencies already embedded in production portfolios.
Security data reinforces the need to prioritize vulnerabilities in dependencies and to steer toward the safest, fastest upgrades, not toward unused or test-only components. In 2025, 40% of vulnerable Maven Central releases carried CVSS 9.0+ scores, showing that severe issues are not rare. Teams can’t control when vulnerabilities are introduced. But, at Maven Central’s scale, success hinges on prioritization and speed, not additional alerts or manual reviews.
FIGURE 1.3: Maven Central Release Additions Over Time
Source: Sonatype
PyPI
PyPI’s growth underscores where developer adoption and dependency sprawl are accelerating most quickly. With 50.64% year-over-year download growth, PyPI reflects the surge of modern workloads tied to AI and cloud development. That velocity brings scale benefits, but it also shows early signs of stress.
In 2025 alone, new component additions accounted for 26% of the total registry catalog, a striking indicator of how quickly the universe of available dependencies is expanding. Each new package increases choice and innovation, but it also multiplies evaluation and enforcement challenges. More components mean more potential entry points for risk and greater transitive exposure as teams pull in deep dependency trees they may not fully understand or monitor.
This level of growth and breadth comes with a clear security signal: risk is not an edge case. In 2025, one in five PyPI releases was associated with a CVSS 7.0+ vulnerability, showing that serious issues regularly flow through everyday pipelines. For organizations relying on PyPI, this makes proactive controls essential. Policy enforcement, automated upgrades, and effective prioritization are no longer optional safeguards, but necessary practices to keep pace with the software ecosystem’s scale and speed.
FIGURE 1.4: PyPI Release by Severity Over Time
Source: Sonatype
npm
In 2025, npm downloads grew 65.43% year over year, and the software ecosystem produced over 60% of all new releases across major registries. This combination of rising consumption plus dominant release volume means npm’s impact is less about catalog size and more about release velocity: constant updates, republishing, and forks increase the rate of dependency change that consuming teams must evaluate and absorb. At this pace, traditional manual review and approval models do not scale.
In 2025 alone, npm recorded 838,778 releases associated with CVSS 9.0+ vulnerabilities, a number that reframes “rare” events into everyday realities. This scale is what enabled watershed incidents like React2Shell, discussed in the Three Layers of Failure in Modern Vulnerability Management chapter, and Shai-Hulud to have ecosystem-wide impact. As detailed in the next chapter, Malware at the Gate, npm faced a number of self-replicating malware campaigns, which ultimately added 171,740 malicious packages to the registry over the span of a few months.
The takeaway is not blame, but scale awareness: when hundreds of thousands of ‘Critical’ releases exist in a single year, teams cannot rely on manual review or reactive patching. Automation, prioritization, and rapid upgrade motion are essential to keeping pace with an ecosystem where critical risk can now propagate as quickly as the code itself.
FIGURE 1.5: Rate of Vulnerable npm Releases Over Time
Source: Sonatype
NuGet
NuGet may not generate the same headline-grabbing download spikes, but its cadence is distinctive. In 2025, NuGet averaged 16.5 releases per new component, pointing to rapid iteration and steady maintenance rather than pure catalog expansion. This level of churn signals active maintenance, frequent fixes, and continuous refinement, especially common in enterprise and platform-oriented .NET development. For consumers, the operational burden isn’t discovering “new,” it’s tracking version changes across dependencies already in production.
The nature of risk within NuGet further raises the stakes of that churn: in 2025, less than 1% of vulnerable NuGet releases fell below CVSS 5, indicating the vast majority of flaws are not noise. At the extreme end, 38.5% of vulnerable NuGet releases were associated with CVSS 9.0+ vulnerabilities. Paired with rapid version turnover, ad hoc patching and manual decision-making quickly break down. What NuGet demands instead are fast, reliable remediation mechanics: clear prioritization and automated upgrade workflows.
We are no longer just measuring growth; we are evaluating its impact. As consumption across this unified software supply chain accelerates, it forces a critical question. How much of this massive consumption is productive, driving genuine innovation and business value? And, more importantly, how much is unproductive waste that the software ecosystem can no longer afford to ignore?
FIGURE 1.6: 2025 Vulnerable NuGet Releases
Source: Sonatype
Policy enforcement, automated upgrades, and effective prioritization are no longer optional safeguards, but necessary practices to keep pace with the software ecosystem's scale and speed.
Real Innovation vs. Synthetic Volume
As software supply chains scale, the impacts of organic growth compared to synthetic growth are increasingly distinct. Understanding this difference helps organizations focus on what truly advances their capabilities and avoid unintentionally contributing to systemic strain.
Organic Growth
Organic growth reflects real shifts in how software is built: AI adoption, cloud migration, and proliferating languages/frameworks increase dependency usage because teams are adding capabilities and moving faster. It raises complexity, but the added dependencies generally map to delivered functionality and business outcomes.
Synthetic Growth
Synthetic growth inflates volume without comparable value. Spam publishing, incentive gaming, malware, and typosquatting can spike project and download metrics, while CI/CD misconfigurations (cold caches, always-clean builds, non-expiring mirrors) repeatedly re-download the same artifacts. The result is higher bandwidth and infrastructure cost — and more risk — without improving software quality.
For organizations trying to manage risk and cost at scale, the distinction matters. Synthetic volume obscures real signals, overwhelms governance processes, and amplifies exposure without delivering benefits. It also shifts burden onto public software ecosystems that were not designed to absorb limitless, redundant traffic.
The Commons is Cracking
Public registries are global distribution systems with real costs: bandwidth and CDN delivery on every download; storage and replication for every release; and ongoing investment in abuse response, malware scanning, moderation, incident handling, and security investigations. As open source expands beyond apps into software infrastructure, AI platforms, and model hubs, these operational demands keep rising.
The sustainability problem isn’t “too much open source” but rather consumption at machine scale. Automation multiplies load: CI pipelines repeatedly pulling the same dependencies, build systems re-resolving dependency graphs, and large organizations running thousands of parallel jobs. Similar patterns are emerging in AI and model hubs, where large artifacts are repeatedly fetched by automated workflows. Defaults built for convenience can turn routine activity into sustained, high-volume demand.
The sustainability problem isn't "too much open source," but rather consumption at machine scale.
And the demand isn’t evenly spread. A small number of consumers, tools, and patterns drive a disproportionate share of traffic, compounding costs, reliability strain, and exposure to abuse. When registries slow down, pause services, or absorb malicious floods, the impact ripples across entire ecosystems — from application development to critical software infrastructure and downstream AI platforms that assume constant availability.
This isn’t a story about one bad actor or one registry failing. It’s an ecosystem-level mismatch between yesterday’s defaults and today’s machine-speed reality. Preserving the commons means updating consumption norms and shared responsibility. Ecosystem health now depends as much on how software is consumed as on how it’s created.
The Impact of Cloud Providers: Where the Load Concentrates
Cloud provider traffic now defines what “normal” looks like on Maven Central. In the latest snapshot, the top three cloud service providers (CSPs) accounted for more than 108 billion requests, while every other user combined represents around 17 billion. Taken another way, CSPs represent just 32.5% of IPs on Maven Central, yet account for more than 86% of downloads.
At that volume, small changes in cloud build behavior (ephemeral runners, cache churn, region replication, image rebuild loops, cold-start fleets) can translate into outsized swings in total registry load.
Figure 1.7: CSPs vs. All Users: Breakdown of Maven Central Downloads
Source: Sonatype
When a small set of CSPS becomes the dominant access path to the ecosystem, Maven Central effectively serves as shared production infrastructure for cloud-native build, deploy and runtime workflows.
The implication for the commons is straightforward: registry strain is increasingly driven by automation at hyperscale, not broad-based organic growth. Improving cache persistence, tightening redundant fetch patterns, and designing “download once, reuse everywhere” behaviors inside cloud delivery pipelines becomes one of the highest-leverage ways to reduce systemic load — because the biggest consumers aren’t “more developers,” they’re a few platforms operating at machine speed.
Redownload Offenders: The Biggest Avoidable Bill on Software Infrastructure
Redownloads are where open source sustainability becomes concrete, because they represent repeat fetches that add load without adding new value. In the last seven days, the heaviest redownload activity is tightly concentrated: a large share of the top redownloaders operate behind just one or a handful of IPs, pointing to centralized CI runners, shared egress gateways, or build fleets behaving like cold-start machines.
The sustainability implication is that avoidable strain on Maven Central is not evenly distributed across the ecosystem. It’s driven by a relatively small set of automation patterns that scale — often inside a single organization — into repeated pulls of the same dependencies. That makes the problem unusually tractable: improvements like durable caching, correctly configured proxies/mirrors, and less “always-clean” dependency resolution can reduce outsized load quickly. Fixing one pipeline can remove pressure that would otherwise be multiplied across thousands of builds.
Overall, the story isn’t “more developers are downloading more.” It’s that modern software delivery is optimized for speed and rebuildability. When cache persistence breaks down, the cost is externalized onto shared infrastructure. The path to sustainability is aligning build defaults with commons realities so the ecosystem can keep moving fast without turning every rebuild into unnecessary traffic.
HOW TO REDUCE REDUNDANT TRAFFIC WITHOUT SLOWING DELIVERY:
- Routing CI through repository managers or caching proxies
- Making build and dependency caches durable across runs CISA
- Pinning and reusing dependencies where appropriate
Why Builds Amplify Load: The Impact of Tools Like Maven and Gradle
Traffic patterns in large registries are not evenly distributed across countless clients — they are highly concentrated. In the case of Maven Central, just two build tools, Maven and Gradle, account for 81.1% of all traffic. This concentration creates outsized implications: small improvements in default behavior, caching strategies, or CI integration for these tools can materially reduce ecosystem-wide load without requiring millions of individual developers to change how they work. When the majority of consumption flows through a narrow set of tools, system-level optimizations become far more effective than relying on per-project best practices alone.
FIGURE 1.9
COMPARING BUILD TOOLS ON MAVEN CENTRAL
|
Criteria |
Maven |
Gradle |
|---|---|---|
| Default "Download Behavior" |
More cache-trusting for pinned versions leads to fewer repeat fetches
|
More cache-correct + frequent revalidation leads to more repeat GETs
|
| Where It Runs (Typical) |
Benefits from long-lived machines or /build nodes with warm local repos
|
Common in ephemeral CI/containers with cold caches each run
|
| Why This Matters at Scale |
Naturally dampens redundant traffic over time
|
Can amplify redundant traffic unless caching/CI reuse is strong
|
| Best Mitigation Lever |
Persist local/CI caches (“download once, use many times”)
|
Durable build cache + CI artifact reuse to cut re-downloads
|
Criteria
| Default "Download Behavior" |
More cache-trusting for pinned versions leads to fewer repeat fetches
|
| Where It Runs (Typical) |
Benefits from long-lived machines or /build nodes with warm local repos
|
| Why This Matters at Scale |
Naturally dampens redundant traffic over time
|
| Best Mitigation Lever |
Persist local/CI caches (“download once, use many times”)
|
Maven
| Default "Download Behavior" |
More cache-correct + frequent revalidation leads to more repeat GETs
|
| Where It Runs (Typical) |
Common in ephemeral CI/containers with cold caches each run
|
| Why This Matters at Scale |
Can amplify redundant traffic unless caching/CI reuse is strong
|
| Best Mitigation Lever |
Durable build cache + CI artifact reuse to cut re-downloads
|
Gradle
| Default "Download Behavior" |
|
| Where It Runs (Typical) |
|
| Why This Matters at Scale |
|
| Best Mitigation Lever |
|
Maven and Gradle amplify registry load in different ways, not because they consume different artifacts, but because their configuration and caching models differ in practice. Gradle is engineered to be cache-correct and CI-friendly: it aggressively revalidates metadata, resolves dependencies in parallel, and is commonly run in short-lived agents or containers where caches start cold. Under normal circumstances, much of that extra traffic would be absorbed by a local caching proxy.
Inserting such a proxy consistently is nearly impossible to do at scale in Gradle because it lacks a strong, hierarchical inheritance model for repository configuration. That makes it difficult to centrally enforce a single caching endpoint without modifying every build or risking breakage. As a result, many Gradle builds effectively (and unintentionally) bypass local caches and hit upstream registries directly, amplifying repeat GETs for the same artifact URLs even when versions are pinned.
Maven, by contrast, has a simpler and more centralized settings model that makes proxying and mirroring straightforward. Combined with Maven’s more cache-trusting behavior for fixed versions and its frequent use on long-lived machines with warm local repositories, this naturally reduces repeat downloads over time.
At scale, redundant downloads don’t just consume bandwidth — they increase load on the services that keep registries safe and reliable (indexing, scanning, abuse detection, and incident response capacity). The practical goal is simple: download once, reuse many times. Teams can cut repeat fetches while improving build speed and reliability by adopting durable caches, shared artifact proxies, and CI patterns that preserve dependencies across runs.
Do Now (Fast Wins)
Do Next (Higher Leverage)
AI Registries as the Next Stress Test on the Software Ecosystem
AI registries and model hubs are the next major stress test for shared distribution infrastructure. They inherit package-registry behaviors such as automation, repeat pulls, and reuse, but with a much heavier cost profile. Models, datasets, and checkpoints are large by default, often hundreds of megabytes to several gigabytes, come in multiple variants, and change frequently as teams iterate. This drives higher bandwidth, storage, and replication demands.
The risk is not just artifact size. If AI usage follows today’s norms such as CI redownloads, weak cross-environment caching, and hotspot automation, load will escalate quickly. Inefficiencies that are tolerable for small packages become expensive and destabilizing at model scale, threatening availability and reliability.
The takeaway is that scale amplifies these default behaviors, so sustainability must be designed in early. Durable caching, artifact reuse, provenance-aware distribution, and AI guardrails to prevent unnecessary pulls are critical now, before AI ecosystems reach package-registry levels of global dependency.
The takeaway is that scale amplifies these default behaviors, so sustainability must be designed in early.
Sustainable Software Infrastructure: What Responsible Consumption Looks Like
Growth across package ecosystems continues, along with the security and operational pressure that scaling creates. As registries grow, more responsibility shifts to consumers to reduce unnecessary load, limit exposure, and keep risk manageable. Responsible consumption is about maintaining developer velocity without increasing supply chain risk.
The biggest lever is architectural. Private repositories and intelligent caching should be the default. Letting CI pipelines pull directly from public registries on every build amplifies traffic, increases failure risk, and creates avoidable exposure during outages or tampering events. Centralizing dependency access through controlled repositories that cache, vet, and reuse artifacts across teams reduces churn, improves build determinism, and narrows the impact of upstream changes.
Architecture also needs guardrails. Organizations should set and enforce consumption policies that reflect real usage at scale, including limits on redundant downloads. SCA and repository management tools help by prioritizing used dependencies, de-duplicating artifacts across projects, and reducing noise from unused or unreachable components. The goal is focus, clearer signals, fewer alerts, and faster remediation.
Responsible consumption is also a shared software ecosystem issue. The heaviest consumers benefit most from public registry reliability, so long-term sustainability requires shared responsibility.
Checklist: Are You Supporting The Software Commons?
Download the Full Report