When you run a repository manager you will likely want to control which artifacts developers have access to. Maybe you also want to try to speed up your builds and reduce the time it takes to find and retrieve the artifacts needed in your build. You might be looking for an easy way to filter out junk artifacts that you don't want to involve in your build. If you are trying to do any of these things, you'll need to know how to configure routing rules in Nexus. In this blog post, I walk through routing rules and provide some answers for people interested in using routing rules to gain more control over repositories in Nexus.
Using a Single Nexus Group
A common approach when using a repository manager is to have one accessible group in Nexus. In the Nexus documentation, the initial recommendation is to create a single group called "public" and to point every developer's Maven Settings to this group using a wildcard mirrorOf setting. This single group contains all of the repositories your developers need to access whether they are proxed, hosted, or virtual repositories. If your developers need another repository, you'll create it and add it to this "global" repository group that consolidates everything into a single point of access.
As time passes, you create more and more proxy repositories, and you add these proxy repositories to this global group. After a while you'll have this huge "public" group with many proxy repositories (and some hosted repositories). For the first part of this discussion, let's focus on the problems that arise when you start to have a large number of proxy repositories in the same Nexus repository group.
The main problem you start to see is that you'll need to pay attention to the ordering of the repositories to keep your builds fast and “valid”. The 1st requirement is obvious (most used repositories should be near the top of the list), but the second is not. The root of the problem is that proxy repositories you added to group may have overlapping contents. If you are not careful, you might end up using some organization’s own customized release of a component instead the canonical release which should reside in Maven Central (or some other canonical source). How do we solve this?
Locating Artifacts: The Nexus UID
To explain how to solve this, let’s look first at what happens when a Repository group tries to resolve an artifact. To explain in depth, I will reference UIDs. A UID in Nexus is a unique identifier for an artifact in a repository. The repository path does not identify uniquely which resource is served to it. You can have the same repository path in multiple repositories. The UID locates an artifact at a particular path in a particular repository, and it is in form of:
repoId:/repo/pathOne
Example of SLF4j API POM version 1.5.8 coming from central (if your Nexus uses repoId “central” for Maven central):
central:/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.pom
The request URL may or may not contain the Nexus UID for an artifact depending on whether it references a repository of a group. For example, the Nexus UID can be derived from the following URL because it references the "central" repository:
http://nexushost/nexus/content/repositories/central/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.pom
Another URL that can be used to retrieve the same artifact maybe not provide the Nexus UID. For example, since the following URL references the public group, you cannot derive the Nexus UID from this URL without iterating through the repositories in the group.
http://nexushost/nexus/content/groups/public/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.pom
The group is a pool of repositories, and the group configuration at the moment of the request affects which POM will be served to us. If a gremlin shuffled the repository order within public group before every request and we had the same POM in multiple repositories, we could never predict which one we got. In this way, the order of the repositories in a group affect which artifacts are returned by a particular request. But, back to UIDs, when Nexus is trying to resolve an artifact in a group, it is trying to decide which artifact, uniquely identified by a UID, should be sent to a downstream client.
How grouping works?
Grouping is actually repository “layering” (analogous to what you would do with images in your favorite photo editor app: adding “layers” above original picture with some other cutouts or text). Content in a higher layer covers the content in a lower level, and where you have no content (”transparency”), the content below is visible. You have as many layers as many repositories takes part in the group. If a repository toward the start of the list doesn't have a matching artifact Nexus iterates through the list until it finds a matching artifact. Simple so far?
Fine, but this is not the whole truth: there is some “special” content that complicates the simple layer transparency: the Maven repository metadata (maven-metadata.xml files). To make Maven able to work as specified, we have to provide full vertical dissection (or merge, as you like) of all Maven metadata in all member repositories on that same path that is requested.
So, let’s say you have the defult Nexus configuration in place, and you have to publish all of this content to your downstream developers:
/com/mycorp/bar/1.0/bar-1.0.pom
/com/mycorp/bar/1.0/bar-1.0.jar
/org/coolorg/maven-metadata.xml
/org/coolorg/foo/1.0/foo-1.0.pom (a Maven2 plugin)
/org/coolorg/foo/1.0/foo-1.0.jar
/org/bigorg/waltdisneyrator/3.0/waltdisneyrator-3.0.pom
/org/bigorg/waltdisneyrator/3.0/waltdisneyrator-3.0.jar
And you have this content distributed in your repositories as:In-house hosted repository:
/com/mycorp/bar/1.0/bar-1.0.pom
/com/mycorp/bar/1.0/bar-1.0.jar
You also reference the Maven central proxy repository which contains:
/org/coolorg/maven-metadata.xml
/org/coolorg/foo/1.0/foo-1.0.pom
/org/coolorg/foo/1.0/foo-1.0.jar
And you are referencing the another proxy repository which has some overlapping artifacts:
/org/coolorg/maven-metadata.xml
/org/coolorg/foo/1.0-coolorg/foo-1.0-coolorg.pom
/org/coolorg/foo/1.0-coolorg/foo-1.0-coolorg.jar
/org/bigorg/waltdisneyrator/3.0/waltdisneyrator-3.0.pom
/org/bigorg/waltdisneyrator/3.0/waltdisneyrator-3.0.jar
/org/bigorg/bad/2.0/bad-2.0.pom
/org/bigorg/bad/2.0/bad-2.0.jar
You put them in a group as following:
Public repository group
inhouse
central
bigorg
This will save you at first glance, but you will potentionally end up with following problems:
Your in-house requests for non-existing (or not yet released) “leaking” out to central and Bigorg repository. This may be considered simply as “bad behavior” (you are making some extraneous requests to those public repos for internal artifacts). Aside from the security implications, this practice seems inefficient.
If you failed to lock down plugin versions in your build (which is very bad thing), you risk using Bigorg’s patched foo plugin instead of the “canonical” one. This way, you may drag in unsupported features, and you may locate bugs due to changing plugin versions.
In general, Nexus makes distinction between two “types” of files in Maven repositories: files that need “To be merged” (maven-metadata.xml) and “everything else”.
The “Everything else” case is simple: just iterate over the member list of the group repository in orderly fashion, repeat the request for same repository path for all of them, and serve the first found. If none of the repository contained the path, simply return “404 Not found”. And this is the reason why keeping main sources of artifacts like Central is close to first place in a repository is good. But this still adds huge overhead to 404 requests (Nexus has to iterate over all repositories to be able to state 404 at all).
The “To be merged” case is similar, but, even though this is a more complex process, thre is no magic. Just iterate all group members for the maven-metadata.xml file requested, and merge them into one Maven Metadata file. When you merge two XML files, there is a potential for information loss. Additionally, since overlapping maven-metadata.xml files are merged in the order defined in the group you need to be aware of how maven-metadata.xml conflicts and changes could affect your build.
How can you affect grouping?
Routes were designed to affect the way groups work. Actually, they only affect groups. Routes are not used if a request is made directly against a plain (hosted or proxy) repository. What routes are able is to affect is the collection of Repositories to iterate over for a given request. They may shrink the collection or repositories to search, but never broaden it! Also, the ordering of the repositories in a group remains unchanged after applying routes. Think of routes as "hints" you can give the group about which repositories contain artifacts that match a certain repository path.
Route Type: Every repository route holds it’s “type”. There are three Repository Route types: BLOCKING, INCLUSIVE and EXCLUSIVE.
Matching Expression: Every repository route must have one Regular expression associated with it. This expression is matched against the requested path.
Scope: A repository route may be “global” (applied to every group in Nexus), or group-local (applied to selected group only).
And finally, INCLUSIVE and EXCLUSIVE routes hold a list of repositories to include or exclude from processing.
When we say a “routing is hit by request”, that means the following:
A client request is made against a group, maybe this is a “public” group which contains a large number of repositories
Nexus select the list of routes that it needs to test the request against. This includes all globally scoped routes as well as routes that apply to the specific group involved in the request.
A route has a regular expression pattern that matches the repository path of the request
Depending on the route type, the matched route will then either include or exclude repositories from a particular group request.
There is a component in Nexus that is responsible for returning the list of repositories involved in a particular group request. This component affects every request to a group. It’s input is the actual request, and the “original list” of member repositories (the list of repositories participating in the group being hit). The response of this component will hold the collection of repositories to be processed.
Keep in mind, that the INCLUSIVE and EXCLUSIVE routes are actually set operations of “intersection” and “subtraction”: they both actually narrow the resulting set! A blocking route may be modeled as “intersection with empty set”. You can use routes to disable requests for artifacts that match a certain pattern by using an Inclusive route with an empty set of matching repositories.
Blocking route type
The blocking route type is the simplest and the most heavy handed one: if a request matches a blocking rule, the repository list to be processed is empty. Nexus will always respond witha 404, even if some of the group members do have the requested artifact or file.
Repositories included in Route: empty collection.
Usage pattern: forbidding artifact(s) from being accessed by a group
If there is a blocking route hit, processing stops. Even if you have other routes that may match the request, the blocking route takes precedence, and it simply empties the repository collection to be processed and stops further route matching.
Inclusive route type
You use this rule to give Nexus a hint: “artifact X should be in repository Y”. This type of routing rule can both speed up artifact resolution and make sure that specific artifacts come from specific repositories. For example, if your applications use Hibernate, and you reference two repositories with two different versions. Create an inclusive routing rule that selects the repository you want Hibernate from.
Repositories Included in Route: an intersection is made between original list and the list of repositories to be included, while keeping the ordering of the original list. Hence, ordering is kept, but the repositories are filtered and they contain only those that are listed in the route. A plain set intersection.
Usage pattern: ensuring the use of a “canonical source” for artifacts, and speeding up resolution of known artifacts that should come from known repository. For example: Maven plugins are known to come from /org/apache/maven/plugins path from the Central Repository repository. Hence, you may create an inclusive rule, that narrows the processed repository collection to the “known to have it” only.
Exclusive route type
This is another interesting and usually confusing route type. Using this rule, you are telling Nexus where not to look for an artifact. This is usually usable when you have a multiple repositories in your group that contains same artifacts (more precisely, share same GAV or path, but are potentially not same at all, like some organization’s patched plugins for example).
Repositories Included in Route: Excluded repositories are removed from the list, and the ordering of the original list is kept.
Usage pattern: filtering out known “bad sources” for artifacts, and also speeding up serving of known artifacts.
Solution to problems above
To fully solve (and still be future proof), you can solve the above problems with your public group by adding following routes:
Route type: INCLUSIVE
Route pattern: ./com/mycorp/. Route includes list: inhouse
Reason: you know that your in-house development may
be deployed to your inhouse repository only. They will be never
published anywhere else. With this rule, every artifact in
“/com/mycorp” and below will be never requested from the Central
Repository or Bigorg’s repository. Faster response, and no information
leaks to the outside world.
Route type: INCLUSIVE
Route pattern: ./org/coolorg/. Route include list: central
Reason: We want to use “canonical” source only for coolorg’s releases. Faster response and we are safe from Bigorg’s modified plugins.
Route type: BLOCKING
Route pattern: ./org/bigorg/bad/.
Reason: we don’t want this dependency be used by our developers, neither directly, neither be pulled in as transitive dependency of waltdisneyrator. Having this, our builds will fail reminding our devs to add excludes to POM.
And finally, a word of warning: the repository routes are very powerful feature of Nexus, but don't get too complex with your routes. They were not meant to be a meticulous, fine-grained approach to procurement. Don't "overuse" this feature to the point where you have hundreds or routes, and don't think you could recreate procurement using routes alone.
Written by Tamas Cservenak
Tamas is a former Senior Develoepr at Sonatype. He has over 15 years of experience developing software systems in Public Services, Telco and Publishing industries.