Nexus Indexer 2.0: Incremental Downloading

By Damian Bradicich

3 minute read time

The Nexus Indexer has become popular, and is the de-facto standard when it comes to indexing maven repositories (including the big boy, Central). As repositories grow and grow, the index of artifacts grows along with them. What was initially a small few hundred kilobyte file will grow into 20-30 megabytes or more over time. Seeing as the index is the gateway into the contents of a repository (not for maven mind you, but for users), this is the most downloaded file, and a 20mb file being downloaded by thousands of people every day, the bandwidth costs can get pretty high. To combat this, we have introduced incremental index handling into the Nexus Indexer. There are two parts, building the incremental indexes for consumers to download, and retrieving the incremental indexes from a provider.

Building Incremental Indexes

When the daily task runs on central to create indexes, the most recent content (in its entirety) is stored in the nexus-maven-repository-index.gz file. This file is always available as a fallback, if a consumer doesn't properly handle incremental indexes, or has fallen so far behind, that the provider no longer has all the incremental portions that the consumer needs. Alongside this, an incremental index is generated that contains all changes (adds/updates/deletes) since the last time the index was generated. This incremental file is very small, compared to the full index, most cases being ~10kb daily. These incremental files are listed in a nexus-maven-repository-index.properties file, along with a chain id. This chain id is used to 'reset' the incremental chain, should a full index download be required for some reason.

Retrieving Incremental Indexes

If the consumer application is integrated with the Nexus Indexer (at least version 2.0), there is nothing to worry about, the nexus-indexer will manage downloading the incremental pieces it is currently missing from the base. If anything does not line up (requires incremental pieces that the provider no longer carries, or the chain id is different), the indexer will then download the full index file and start checking for incremental changes next time it updates.

This is all handled in the nexus-maven-repository-index.properties file.

  • nexus.index.chain-id: this is the chain-id of the current incremental items. If this value changes from what the consumer has in its local properties file, the consumer should trigger a full .gz index download (and of course the properties file, to keep up to date)

  • nexus.index.last-incremental: This is the last incremental item available, simply an integer that gets inserted into the download file name. If the consumer has the same value in its local properties file, no need to download anything.

  • nexus.index.incremental-X: These are the properties that list each incremental item available. The first item (where X = 0) is the oldest incremental piece that the provider still maintains. If the consumer's local properties file contains a last-incremental value less than this, then you need to download full .gz index (and properties file) and continue on. Otherwise, simply need to grab every nexus-maven-repository-index.X.gz file (where x is greater than your local last-incremental and less than or equal to the remote last-incremental) available from the provider.

Support for Legacy Index Applications

Of course, we don't want to leave the legacy guys out in the cold, so the old timestamp based properties are also available:

  • nexus.index.time: Timestamp that the legacy .zip index was last created. If this timestamp differs from your local properties file, you will want to download the full .zip index

  • nexus.index.timestamp: Timestamp that the .gz index was last created. If this timestamp differs from your local properties file, you will want to download the full .gz index

So to wrap everything up, plain and simply, if your application is integrated with the nexus-indexer, you should definitely upgrade to 2.0.0, to get this enormous bandwidth saving. This has already been achieved with the latest m2eclipse 0.9.8 release, and will be coming in Sonatype Nexus Repository 1.4.

Picture of Damian Bradicich

Written by Damian Bradicich

Damian is a software engineer at Sonatype. He is based in Connecticut.

Tags