When you run a repository manager, you will likely want to control which artifacts developers have access to. Maybe you also want to try to speed up your builds and reduce the time it takes to find and retrieve the artifacts needed in your build. You might be looking for an easy way to filter out junk artifacts that you don't want to involve in your build. If you are trying to do any of these things, you'll need to know how to configure routing rules in Nexus. In this blog post, I walk through routing rules and provide some answers for people interested in using routing rules to gain more control over repositories in Nexus.
A common approach when using a repository manager is to have one accessible group in Nexus. In the Nexus documentation, the initial recommendation is to create a single group called "public" and to point every developer's Maven Settings to this group using a wildcard mirrorOf setting. This single group contains all the repositories your developers need to access, whether they are proxed, hosted, or virtual repositories. If your developers need another repository, you'll create it and add it to this "global" repository group that consolidates everything into a single point of access.
As time passes, you create more and more proxy repositories, and you add these proxy repositories to this global group. After a while, you'll have this huge "public" group with many proxy repositories (and some hosted repositories). For the first part of this discussion, let's focus on the problems that arise when you start to have many proxy repositories in the same Nexus repository group.
The main problem you start to see is that you'll need to pay attention to the ordering of the repositories to keep your builds fast and "valid." The 1st requirement is obvious (most used repositories should be near the top of the list), but the second is not. The root of the problem is that the proxy repositories you added to the group may have overlapping contents. If you are not careful, you might end up using some organization’s own customized release of a component instead of the canonical release, which should reside in Maven Central (or some other canonical source). How do we solve this?
To explain how to solve this, let's look first at what happens when a repository group attempts to resolve an artifact. To explain in depth, I will reference UIDs. A UID in Nexus is a unique identifier for an artifact in a repository. The repository path does not identify uniquely which resource is served to it. You can have the same repository path in multiple repositories. The UID locates an artifact at a particular path in a particular repository, and it is in the form of:
repoId:/repo/pathOne
Example of SLF4j API POM version 1.5.8 coming from central (if your Nexus uses repoId “central” for Maven central):
central:/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.pom
The request URL may or may not contain the Nexus UID for an artifact, depending on whether it references a repository of a group. For example, the Nexus UID can be derived from the following URL because it references the "central" repository:
http://nexushost/nexus/content/repositories/central/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.pom
Another URL that can be used to retrieve the same artifact maybe not provide the Nexus UID. For example, since the following URL references the public group, you cannot derive the Nexus UID from this URL without iterating through the repositories in the group.
http://nexushost/nexus/content/groups/public/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.pom
The group is a pool of repositories, and the group configuration at the moment of the request affects which POM will be served to us. If a gremlin shuffled the repository order within the public group before every request, and we had the same POM in multiple repositories, we could never predict which one we got. In this way, the order of the repositories in a group affects a particular request returns which artifacts. But, back to UIDs, when Nexus is trying to resolve an artifact in a group, it is trying to decide which artifact, uniquely identified by a UID, should be sent to a downstream client.
Grouping is actually repository "layering" (analogous to what you would do with images in your favorite photo editor app: adding “layers” above the original picture with some other cutouts or text). Content in a higher layer covers the content in a lower level, and where you have no content ("transparency"), the content below is visible. You have as many layers as many repositories participate in the group. If a repository toward the start of the list doesn't have a matching artifact, Nexus iterates through the list until it finds a matching artifact. Simple so far?
Fine, but this is not the whole truth: there is some "special" content that complicates the simple layer of transparency: the Maven repository metadata (maven-metadata.xml files). To make Maven work as specified, we must provide full vertical dissection (or merge, as you like) of all Maven metadata in all member repositories on the same path requested.
So, let’s say you have the default Nexus configuration in place, and you have to publish all this content to your downstream developers:
/com/mycorp/bar/1.0/bar-1.0.pom /com/mycorp/bar/1.0/bar-1.0.jar /org/coolorg/maven-metadata.xml /org/coolorg/foo/1.0/foo-1.0.pom (a Maven2 plugin) /org/coolorg/foo/1.0/foo-1.0.jar /org/bigorg/waltdisneyrator/3.0/waltdisneyrator-3.0.pom /org/bigorg/waltdisneyrator/3.0/waltdisneyrator-3.0.jar
And you have this content distributed in your repositories as:In-house hosted repository:
/com/mycorp/bar/1.0/bar-1.0.pom /com/mycorp/bar/1.0/bar-1.0.jar
You also reference the Maven central proxy repository, which contains:
/org/coolorg/maven-metadata.xml /org/coolorg/foo/1.0/foo-1.0.pom /org/coolorg/foo/1.0/foo-1.0.jar
And you are referencing the another proxy repository, which has some overlapping artifacts:
/org/coolorg/maven-metadata.xml /org/coolorg/foo/1.0-coolorg/foo-1.0-coolorg.pom /org/coolorg/foo/1.0-coolorg/foo-1.0-coolorg.jar /org/bigorg/waltdisneyrator/3.0/waltdisneyrator-3.0.pom /org/bigorg/waltdisneyrator/3.0/waltdisneyrator-3.0.jar /org/bigorg/bad/2.0/bad-2.0.pom /org/bigorg/bad/2.0/bad-2.0.jar
You put them in a group as follows:
Public repository group inhouse central bigorg
This will save you at first glance, but you will potentially end up with the following problems:
Your in-house requests for non-existing (or not yet released) "leaking" out to central and Bigorg repository. This may be considered "bad behavior" (you are making some extraneous requests to those public repos for internal artifacts). Apart from the security implications, this practice seems inefficient.
If you failed to lock down plugin versions in your build (which is a bad thing), you risk using Bigorg's patched foo plugin instead of the "canonical" one. This way, you may drag in unsupported features, and you may locate bugs due to changing plugin versions.
In general, Nexus distinguishes between two "types" of files in Maven repositories: files that need "To be merged" (maven-metadata.xml) and "everything else."
The "Everything else" case is simple: just iterate over the member list of the group repository in orderly fashion, repeat the request for the same repository path for all of them, and serve the first found. If none of the repository contained the path, simply return "404 Not found." And this is the reason why keeping main sources of artifacts like Central close to first place in a repository is good. But this still adds huge overhead to 404 requests (Nexus has to iterate over all repositories to state 404 at all).
The "To be merged" case is similar, but even though this is a more complex process, there is no magic. Just iterate all group members for the maven-metadata.xml file requested, and merge them into one Maven Metadata file. When you merge two XML files, there is potential for information loss. Additionally, since overlapping maven-metadata.xml files are merged in the order defined in the group, you need to know how maven-metadata.xml conflicts and changes could affect your build.
Routes were designed to affect the way groups work. Actually, they only affect groups. Routes are not used if a request is made directly against a plain (hosted or proxy) repository. What routes can affect is the collection of repositories to iterate over for a given request. They may shrink the collection or repositories to search, but never broaden it. Also, the ordering of the repositories in a group remains unchanged after applying routes. Think of routes as "hints" you can give the group about which repositories contain artifacts that match a certain repository path.
Route Type: Every repository route holds it’s "type." There are three Repository Route types: BLOCKING, INCLUSIVE and EXCLUSIVE.
Matching Expression: Every repository route must have one Regular expression associated with it. This expression is matched against the requested path.
Scope: A repository route may be "global" (applied to every group in Nexus), or group-local (applied to selected group only).
And finally, INCLUSIVE and EXCLUSIVE routes hold a list of repositories to include or exclude from processing.
When we say a "routing is hit by request," that means the following:
A client request is made against a group. Maybe this is a "public" group, which contains many repositories Nexus selects the list of routes it needs to test the request against. This includes all globally scoped routes, as well as routes that apply to the specific group involved in the request.
A route has a regular expression pattern that matches the repository path of the request
Depending on the route type, the matched route will then either include or exclude repositories from a particular group request.
There is a component in Nexus responsible for returning the list of repositories involved in a particular group request. This component affects every request to a group. It's input is the actual request, and the "original list" of member repositories (the list of repositories participating in the group being hit). The response of this component will hold the collection of repositories to be processed.
Keep in mind that the INCLUSIVE and EXCLUSIVE routes are actually set operations of "intersection" and "subtraction": they both actually narrow the resulting set. A blocking route may be modeled as "intersection with empty set." You can use routes to disable requests for artifacts that match a certain pattern by using an Inclusive route with an empty set of matching repositories.
The blocking route type is the simplest and most heavy handed: if a request matches a blocking rule, the repository list to be processed is empty. Nexus will always respond with a 404, even if some group members have the requested artifact or file.
Repositories included in Route: empty collection.
Usage pattern: forbidding artifact(s) from being accessed by a group
If there is a blocking route hit, processing stops. Even if you have other routes that may match the request, the blocking route takes precedence, and it simply empties the repository collection to be processed and stops further route matching.
You use this rule to give Nexus a hint: "artifact X should be in repository Y." This type of routing rule can both speed up artifact resolution and ensure that specific artifacts come from specific repositories. For example, if your applications use Hibernate, and you reference two repositories with two different versions. Create an inclusive routing rule that selects the repository you want Hibernate from.
Repositories Included in Route: an intersection is made between the original list and the list of repositories to be included, while keeping the ordering of the original list. Hence, ordering is kept, but the repositories are filtered, and contain only those listed in the route. A plain set intersection.
Usage pattern: ensuring the use of a "canonical source" for artifacts, and speeding up resolution of known artifacts that should come from a known repository. For example: Maven plugins are known to come from /org/apache/maven/plugins path from the Central Repository repository. Hence, you may create an inclusive rule that narrows the processed repository collection to the "known to have it" only.
This is another interesting and usually confusing route type. Using this rule, you are telling Nexus where not to look for an artifact. This is usually usable when you have multiple repositories in your group that contain the same artifacts (more precisely, share the same GAV or path, but are potentially not the same at all, like some organization’s patched plugins for example).
Repositories Included in Route: Excluded repositories are removed from the list, and the ordering of the original list is kept.
Usage pattern: filtering out known "bad sources" for artifacts, and also speeding up serving of known artifacts.
To fully solve (and still be future proof), you can solve the above problems with your public group by adding the following routes:
Route type: INCLUSIVE
Route pattern: ./com/mycorp/. Route includes list: inhouse
Reason: you know that your in-house development may only be deployed to your inhouse repository only. They will never be published anywhere else. With this rule, every artifact in "/com/mycorp" and below will never be requested from the Central Repository or Bigorg's repository. Faster response, and no information leaks to the outside world.
Route type: INCLUSIVE
Route pattern: ./org/coolorg/. Route include list: central
Reason: We want to use "canonical" source only for coolorg's releases. Faster response, and we are safe from Bigorg's modified plugins.
Route type: BLOCKING
Route pattern: ./org/bigorg/bad/.
Reason: we don't want our developers to use this dependency, neither directly, nor be pulled in as transitive dependency of waltdisneyrator. Having this, our builds will fail, reminding our devs to add excludes to POM. And finally, a word of warning: the repository routes are a powerful feature of Nexus, but don't get too complex with your routes. They were not meant to be a meticulous, fine-grained approach to procurement. Don't "overuse" this feature to the point where you have hundreds of routes, and don't think you could recreate procurement using routes alone.