Open Source Malware at the Gate
The Evolving Software Supply Chain Attack Surface
A Turning Point for Open Source Malware
Throughout 2025, Sonatype identified more than 454,600 new malicious packages, bringing the cumulative total of known and blocked malware to over 1.233 million packages across npm, PyPI, Maven Central, NuGet, and Hugging Face. This year, we observed that the evolution of open source malware crystallized, evolving from spam and stunts into sustained, industrialized campaigns against the people and tooling that build software.
FIGURE 2.1: Annual Open Source Malware Growth
Source: Sonatype
What stands out most about 2025 is not just the scale of the threat, but also the sophistication. Where 2024’s XZ Utils incident was groundbreaking, demonstrating how a single compromised maintainer could imperil global infrastructure, 2025 saw software supply chain risk evolve dramatically.
This year, over 99% of open source malware occurred on npm. State-linked entities such as the Lazarus Group advanced from simple droppers and crypto miners to five-stage payload chains that combined droppers, credential theft, and persistent remote access inside developer environments. The first-ever self-replicating npm malware (Shai-Hulud, quickly followed by Sha1-Hulud) proved that open source malware can now propagate autonomously through open source ecosystems. IndonesianFoods created more than 150,000 malicious packages in just a couple of days. And a series of offensive hijackings of trusted packages like chalk and debug showed that established maintainers of high-profile packages are being targeted as entry points for mass distribution.
Taken together, these developments mark 2025 as a grim year for open source malware: the moment when isolated incidents became an integrated campaign, and bad actors proved software supply chain attacks are now their most reliable weapon.
The Threat Taxonomy: What Open Source Malware Does Today
Open source malware is best understood less as a set of isolated “bad packages” and more as a set of repeatable behaviors that exploit how modern software is built and shipped. Public registries provide a low-friction distribution channel, while developer machines and CI/CD pipelines provide an execution environment that often sits close to sensitive data and production access. As a result, the malicious package is increasingly not the whole attack, but the first step in a larger supply chain intrusion.
FIGURE 2.2: 2025 Landscape - Open Source Malware by Threat Type
Source: Sonatype
Registries Are Being Used as Distribution Platforms
In 2025, the dominant pattern is operational scale through ecosystem mechanics. Repository abuse shows up in 55.9% of all logged malicious packages, indicating actors are treating registries like platforms: automating publication and iterating quickly to maximize reach. Repository abuse packages have been observed harvesting TEA tokens or seeking clicks on spam links. Alongside that, Potentially Unwanted Application (PUA) appears in 27.5% of packages, which include items like empty packages, demos with hardcoded credentials, or messaging app spam bot orchestration frameworks. These are packages that don’t necessarily compromise the developer who installs it or the application it is bundled into, but are still unwanted in developer environments.
Developer and Build Environments Are the Prize
A consistent objective is harvesting valuable data from where software gets built. Host information exfiltration appears in 5.7% of packages, and secrets exfiltration in 3.9%. These aren’t the largest categories by volume, but they’re high-leverage: packages run inside developer machines and CI/CD environments where tokens, API keys, and CI credentials are commonly present and reusable.
Attacks Are Engineered as Chains, Not Single Payloads
Sonatype observed clear signs of staged delivery and follow-on capability. Droppers/loaders appear in 2.7% of packages, and backdoors in 2.1%, with obfuscated code in 1.6% acting as a force multiplier that helps these chains persist and evade inspection. Even lower-volume disruption behaviors matter for impact: data corruption appears in 0.62% and targets build outputs and release workflows where compromise can propagate downstream.
Developers Are the Attack Vector
Software supply chain attackers are perfecting social and technical mimicry to target and exploit developers making development decisions fast and with incomplete information:
-
Typosquatting and Namespace Confusion
Typosquatting and namespace confusion remain staple techniques, but they operate differently. Typosquatting relies on minor spelling variations of legitimate package names, counting on human error during installation. Namespace confusion exploits how package managers resolve dependencies across public and private scopes. This allows attackers to publish public packages with the same name as internal or expected dependencies, so they are inadvertently pulled into builds. -
Toolchain Masquerading
Toolchain masquerading is accelerating. Rather than posing as generic utilities, malicious packages increasingly impersonate the everyday tools developers install reflexively: framework add-ons, build plugins, linters, scaffolding utilities, and migration helpers. These packages are designed to look like routine workflow dependencies, making them more likely to be installed without close inspection. -
Front-end Workflow Lures
Front-end workflow lures are especially common. Attackers cluster package names around high-velocity ecosystems and popular tooling where dependency decisions are frequent, repetitive, and time-boxed. In these environments, developers often add or swap dependencies rapidly, creating ideal conditions for malicious lookalikes to blend in.
Attackers increasingly rely less on individual mistakes and more on scale, momentum, and volume. They know developers under deadline pressure are unlikely to pay detailed attention on every dependency. If a package “looks right” with mostly comprehensible code, a legitimate seeming README.MD, and a reasonable amount of downloads, it is likely to get installed.
How North Korea Weaponizes Open Source
The Lazarus Group, or APT38, epitomizes the 2025 malware shift from opportunistic to industrialized. Building on earlier research, Sonatype identified more than 800 Lazarus-associated packages this year, concentrated overwhelmingly in npm (97%). In practical terms, npm provides the fastest path from package publication to developer workstation because it does not require namespace validation and tooling prefers the latest versions. By concentrating activity there, Lazarus maximizes the likelihood that poisoned dependencies will be installed quickly, propagate through transitive dependency chains, and spill into build pipelines, CI/CD systems, and downstream production environments with minimal friction.
This level of sustained activity aligns with broader public reporting that cyber operations, including theft, espionage, and cryptocurrency-related crime, are now a significant source of revenue for the North Korean government. As a result, Lazarus now operates as one of the most prolific and successful state-sponsored cybercriminal enterprises in operation today. Lazarus is investing in ecosystems where speed, scale, and reuse combine to maximize the downstream impact of each compromised dependency.
Hybrid Open Source Malware Dominates the Lazarus Playbook
Lazarus packages are distinguished by how they integrate multiple threat behaviors into a single component. These aren’t single-purpose nuisances; they’re multi-function packages designed to support a staged intrusion chain. In the dataset, Sonatype Security Research observed that most Lazarus packages carry multiple threat behaviors: roughly 77% include two or more threat types, and nearly 9% include four or more. In plain terms, the “package” is often just stage zero.
FIGURE 2.3 Lazarus Group Packages by Number of Threat Type
Source: Sonatype
Behaviorally, the profile is dropper-led and credential-first: droppers appear in ~98% of packages, secrets exfiltration in ~64%, and backdoor functionality in ~29%. That combination matters. Droppers keep the published artifact small and less obviously malicious; exfiltration turns a single install into stolen tokens and credentials; and backdoor capability reflects investment in persistence and post-compromise control. The Lazarus pattern demonstrates repeatable intrusion tooling that is built to land quietly, harvest access, and remain useful after the initial foothold.
FIGURE 2.4 Lazarus Group Campaign Threat Types
Source: Sonatype
Targeting is Optimized to Exploit Muscle Memory
Lazarus targeting is engineered around how developers actually pick dependencies: familiar names, familiar ecosystems, familiar moments of need. These packages do not resemble overt threats; rather, these packages present as the routine glue of front-end workflows, such as framework add-ons, build helpers, plugin utilities, and configuration packages that developers install reflexively.
FIGURE 2.5 Top Lazarus Group Developer Lures
Source: Sonatype
The naming patterns show deliberate clustering around high-velocity toolchains, such as Tailwind, Vite, and React. Zooming out, nearly 43% of Lazarus-linked packages reference common developer framework or tool keywords. This is an intentional distribution strategy. These ecosystems have high dependency churn, many “one more plugin” installs, and constant troubleshooting under deadlines. That’s the ideal environment for lookalike packages to blend in and get pulled into both workstations and CI. Sonatype’s prior research showed that modern applications routinely contain hundreds of dependencies — averaging around 180 — making it unrealistic for developers to closely scrutinize every package they consume.
Execution is Modular and Repeatable
One of the most important operational signals in Sonatype’s analysis is how scalable the campaign was. The data shows strong indicators of templated reuse and rapid variant generation as opposed to one-off, bespoke malware. The distribution is sharply concentrated: Sonatype Security Research mapped 341 packages to a set of just 32 anchor packages, and the largest anchor clusters fan out into dozens of related variants.
That concentration is a direct indicator of manufacturing capacity: Lazarus can iterate quickly, generate families of near-neighbors, and keep publishing even as specific packages are identified and removed. In other words, this is not a handful of malicious uploads. It’s a production line.
The Shai-Hulud Software Supply Chain Attacks: A New Era of Self-Replicating Malware
The Shai-Hulud software supply chain attack in September 2025 marked a turning point: the first known self-replicating npm malware observed spreading autonomously across developer environments and packages, more like a traditional network worm than a passive library.
Hidden deep within duplicate files and nested directories, Shai-Hulud evaded superficial scans and leveraged maintainer credential theft to publish poisoned updates. The worm compromised more than 500 packages in days, spreading autonomously across registries and developer machines.
The result was a rapidly self-propagating software supply chain worm, capable of infecting projects downstream without any manual publication step. This was quickly followed by another self-replicating npm malware in November, named “Sha1-Hulud: The Second Coming.” These campaigns illustrate the next phase of open source malware — one that behaves more like network worms than passive implants.
In contrast to traditionally-understood malware, which needed to be downloaded and installed before the malware would execute, open source malware executes pre-install, meaning developers only need to download in order to become a victim.
EACH SHAI-HALUD PACKAGE CARRIED A PAYLOAD DESIGNED TO:
- Steal npm authentication tokens,
- Replicate by infecting other locally linked projects, and
- Exfiltrate environment credentials using encrypted payloads.
To support this, the attackers used public code-hosting services as dead drops, helping the traffic blend in with normal developer workflows.
Self-Replicating Malware Attacks in 2025
- September 16, 2025
- October 17, 2025
- November 9, 2025
- November 11, 2025
- November 24, 2025
- December 1, 2025
npm
500+ packages
The first documented self-replicating open source malware; demonstrated innovative use of automation by attackers to hijack accounts and publish new, malicious versions of legitimate packages.
OpenVSX and Microsoft VSCode
12 packages
Impersonated popular developer tools to steal credentials, drain cryptocurrency wallets, and use the Solana blockchain for command-and-control communication
OpenVSX and Microsoft VSCode
3 packages
New malicious packages uncovered with 10,000 downloads using new extensions and publisher accounts to bypass cleanup efforts.
npm
169,538 packages
This campaign was designed to self-replicate every seven seconds. While some packages abused the TEA protocol, most appeared designed to overwhelm detection and exploit ecosystem trust at scale.
npm
49 packages
The hijacking campaign surged a second time with a new name and slight tweaks to evade detection; the attackers also introduced the use of Bun to deploy the payload.
OpenVSX and Microsoft VSCode
24 packages
In this third wave, the threat actors artificially inflated download counts of the packages to increase discoverability.
The Open Source Malware Supply Chain
Modern open source malware is modular, resilient, and designed to bypass both static and human inspection.
- Multi-stage payloads: Droppers download encrypted payloads from C2 servers or embed secondary stages locally.
- Obfuscation layers: Increasing use of eval(), encoded scripts, or disguised binaries within legitimate file trees.
- Legitimate infrastructure for C2: Slack, GitHub, Dropbox, and even logging services (like Better Stack) are co-opted for command-and-control traffic.
- Local project propagation: Recent attacks weaponize developer machines to infect all other projects they find and push infected versions upstream.
- Multi-process behavior: Telemetry from Sonatype’s behavioral analysis indicates a rise in “multi-process modular malware,” particularly in npm and PyPI.
- Install-time execution: The latest malicious packages run during installation, dropping payloads before builds.
The throughline shows malware is adopting the same modular architecture that makes open source so powerful. In 2025, software supply chain attacks mirrored the software supply chain itself. The risk is not theoretical. It’s structural.
This phenomenon is especially visible in ML and DevOps contexts. MLOps is still a newer, less mature discipline, and it has not yet absorbed many of the supply chain lessons that became standard practice in traditional software development. Combined with intense pressure for rapid experimentation and deployment, teams often default to convenience-driven workflows that bypass normal governance.
In practice, that shows up as ungoverned “shadow downloads” that pull artifacts directly from wherever they are easiest to access. Examples include precompiled Python wheels and CUDA libraries fetched from unofficial sources, Hugging Face models loaded directly through package installs or runtime calls, and internal scripts or agents that silently retrieve dependencies from places like GitHub or Pastebin.
This mirrors the “Complacency and Contamination” model from the 10th Annual State of the Software Supply Chain report. Shadow downloads are the modern form of contamination, created when enforcement gaps intersect with developer convenience and automation.
HOW SHADOW DOWNLOADS COMPOUND OPEN SOURCE MALWARE RISK
- Invisible: Shadow artifacts often never appear in SBOMs or inventory systems.
- Unscanned: Because they bypass governed repositories, these artifacts frequently evade security scanning and policy enforcement altogether.
- Unattributable: With no verified origin or provenance, organizations have no reliable way to trust, trace, or audit what they’ve pulled in.
Emerging Software Supply Chain Risks
As AI becomes core to modern pipelines, attackers are following the trend, embedding malicious payloads into container images, AI models, and helper binaries distributed through trusted platforms.
Malicious AI Models in Hugging Face
Although many quarantined models observed to-date are not overtly nefarious, the underlying pattern reveals a structural weakness in model registries: model artifacts are being treated like data and scanned as single items, but in reality, most behave more like code and can be treated much the same way.
Sonatype’s research into picklescan vulnerabilities underscored why this is uniquely dangerous in ML: widely used serialization formats can execute code during deserialization, turning a routine “load model” step into an execution path.
It’s important to note the shape of the malicious activity observed on Hugging Face: many of these repositories appear consistent with security research or proof-of-concept demonstration uploads rather than fully operational criminal campaigns. Some are transparently labeled as unsafe, and several show low download counts. That doesn’t reduce the underlying software supply chain risk, but rather highlights it. In a model registry, even a “demo” artifact can be copied, repackaged, or pulled into the wrong environment, and the consequences play out at runtime.
Two examples illustrate why this matters:
Model registries need the same supply-chain guarantee as package registries, because the blast radius of compromised models often includes the very systems that hold the highest-value secrets.
Backdoored Model Artifacts Enabling Remote Access
A cluster of models published under the same account exhibited behavior consistent with establishing a reverse shell to an external host, granting an attacker interactive access to any machine that loads the model. Even when download counts are low, the risk is disproportionate: models are frequently pulled into shared environments (developer workstations, notebooks, CI runners, GPU boxes) where credentials and tokens are plentiful.
Embedded Malicious Code in Serialized Model Files
In another case, a model artifact (a serialized file) contained embedded malicious logic that invoked common system tooling to exfiltrate local files (for example, transmitting /etc/passwd to a remote endpoint). The key point isn’t the specific file targeted — it’s the mechanism: a “model download” can become code execution at load time if organizations treat model artifacts as inherently safe.
AI Agents as Software Supply Chain Attack Multipliers
AI development assistants and autonomous agents have rushed into developer workflows, but the integration of those agents into their security models has not happened. Experiments show agents have a knowledge cut-off date well in the past, resulting in them happily installing whatever dependency resolves a build error without checking provenance, policy, or known-malicious indicators.
In the From Guesswork to Governance chapter, we will show that AI code assistants like Claude or ChatGPT can fetch and install malicious code automatically when prompted to fix dependency errors or install missing libraries. The developer’s intent may be harmless, but the result can be catastrophic.
Attackers are increasingly preying on this. Sonatype’s 2025 malware research continues to document deceptive naming patterns — including typosquatting and new evasion tactics that mimic legitimate dependencies to trick developers into installing malware. As organizations integrate AI coding assistants into production workflows, they must recognize that these systems are not neutral intermediaries. They are potential infection vectors.
MITIGATE YOUR RISK AGAINST EMERGING THREATS
Block open source malware before it enters your repositories or pipelines.
How Will Software Supply Chain Attacks Evolve?
The next frontier of software supply chain attacks is not limited to package managers. AI model hubs and autonomous agents are converging with open source into a single, fluid software supply chain — a mesh of interdependent ecosystems without uniform security standards.
Malware authors already understand this convergence. They are embedding persistence inside containers, pickled model files, and precompiled binaries that flow between data scientists, CI/CD systems, and runtime environments.
DEVELOPERS ARE NO LONGER AT THE PERIMETER. THEY ARE THE PERIMETER.
Download the Full Report