Interview transcript: New Nexus features + future of Maven

March 04, 2009 By Tim OBrien

14 minute read time

For those of you who prefer the printed word over the audio
from Brian's interview, this is the full transcript for Brian's
interview about Sonatype Nexus Repository (formerly known as "Nexus Pro") and Maven Central from last week.

Transcript

Tim O'Brien: Its been a few months since Nexus Pro was released and its been about a month since Nexus Pro has been out on the market?

Brian Fox: It's been exactly a month since Nexus Pro went out. I think the Pro release has been pretty well received, the features, the staging, promotion and ldap etc., all seem to be hitting the way we want it to be, solving the problems the people were having. Our main focus on the Pro release and the features was to build the infrastructure, and to be able to, first of all, stop people from having to manually move artifacts around. With the staging and promotion that was happening, people had to, if they wanted to stage them, they had to find a place to put them, test them, then manually move the artifacts. Usually the metadata was not updated, there were hashes, mismatches and things like that.

The same was true with promotions. When organizations had strict policies about who could download things into their internal repositories, usually they had a hosted instance of Central and somebody would have to go manually do that and again the hashes and mismatches and things were often corrupted.

The first version of the Pro was aimed at simply putting the features together to stop that from happening and then we can build upon those down the road with extended work flows and other automated features. The feed back we've received so far is its exactly what they need now, but everybody wants more.

TO: Tell me about some of the features that are planned for the next
version of Nexus Pro.

BF: The next version of Nexus... The core functionality has quite a bit of changes in it. That would be the 1.3 version. We've done a lot of architectural upgrades in every version since 1.0, mostly focusing at extending our plugin API to allow us to provide more and more functionality.

The 1.2 Pro release, those features were all built as plugins to the Nexus core. We've extended that even further in 1.3 to allow more types of plugins in the security models specifically for external realms has been significantly enhanced. This applies not only to the Pro LDAP plugin, but also any other open source security realms that people may create.

For example there is a crowd realm that Justin Edelson donated and a couple of other ones that had been developed. One we developed for Apache and a couple of others that people in the community are working on. In addition to that, the core has a couple of pretty cool new features. One of them is the ability to support mirrors of repositories in an intelligent way. This is something that's been sort of a little bit of a problem with Maven because everybody goes against the central repository even if there may be a mirror closer to them because they want to make sure they have the latest and greatest data all the time.

Only a very small subset of the repository changes over time. If we get a few artifacts in there the chances are the mirror that is closest to you has the right version of the things you need. What we've done in Nexus is we've allowed you to define a canonical repository URL, just like you would do currently. The repositories are able to expose metadata, which is now on the central repository which describes all of the known mirrors of it.

Nexus is able to leverage this and automatically populate a screen that you can use to choose mirrors that you want to use. The way it works is you can choose a mirror and we will use that mirror for retrieving the artifacts. If the artifacts are not found on that mirror then it will automatically fall back on the canonical URL. In addition to that we use the canonical URL to retrieve all the hashes so that you can be sure that if you get something from a mirror, if the hash matches its the same file that was on the canonical repository. This will allow people to more efficiently use local mirrors without giving up the ability to get the latest and greatest updates from Central and the security of knowing you don't have any artifacts that have been corrupted in the mirroring process.

That's a pretty big feature in 1.3. Some other cool things that we've done, the logs are now configurable via the UI; before you had to edit the Log4j settings. So now you can enable debug mode temporarily and see things right through the online log viewer. That's a favorite of mine when I'm doing a lot of debugging.

The repository screens have all been combined into a single view with tabs. Before we had a separate screen for browsing repositories and that was separate from the one used to manage the repositories and that was separate from the Pro version where you could see staging repositories and stuff like that. All 3 screens have been merged into one tab below based upon the privileges you have. And we think that will make it a lot easier for people to figure out what they need. The screens in the past looked very similar but the functionality was different and it was confusing. So we fixed that.

In addition to that, underneath the hood we've done a lot of changes to the way that groups are actually represented. In the past the repository groups, which is a feature that lets you aggregate multiple repositories together, that was implemented as sort of a level above repositories. There was some problems with that because it was only a logical router that spun through each of the repositories to find out what it needed. It didn't have the ability to store data directly. That became a problem when we need to host things like the repository, metadata or the indexes.

In 1.3, underneath the covers, groups are actually implemented as a first class repository now. That means down the road we will be able to change the UI and expose the ability to have groups of groups and do a lot of other cool things that we have planned for 1.4. Right now, that's architecturally changed in the core, but the UI for the most part looks the same today.

We have also made quite a bit of progress on the indexer, itself. So there were a lot of bugs related to the searching where you couldn't, usually the search was giving you more results than you actually wanted and that was due to the way the index server was tokenizg a lot of the artifacts underneath -- ID's, groups and classes were all tokenzd based on dashes and those things. We completely changed that so that all that information is there but even more importantly we've actually changed the download format. Now it's a little bit smaller, we use a better compression technique to compress that and it also allows the download to be Lucene version neutral. Actually the zip file -- the luceen 2.3 index database zipped up. The tools don't provide any way to go back and forth. If you upgraded to 2.4, you would never be able to publish the correct index for downstream users. So we created our own binary format and the Nexus API jars are able to transparently convert from the binary format to the correct Lucene format internally. This gives us a great deal of flexibility to upgrade down the road to upgrade to newer versions of Lucene to take advantage of their features and things like that.

Also, coming very soon is the ability to have incremental index downloads. So the index on Central for example will only be updated once a day, but it will include only the things that have changed since the last index so you won't have to download a 30mb file every week. You'll just get a couple of "K" file every day. That will help everybody. Nexus 1.3 will support that you just have to enable that on the central repository to produce that.

TO: Speaking of the index, could you talk about some of the problems that were happening and give a little update on the language being used by Central at this point?

BF: That's a good question, I actually don't know anymore because its been so low, I haven't had to pay much attention to. Around Thanksgiving we had some pretty serious problems with the Central repository, which is hosted off of a 100mb connection. Initially on a regular basis, the Apache HTPD was running out of workers threads. As we kept increasing that, the load in the machine was getting through the roof to a point of where it was about 200 on a 5-10 minute average on the red tab machine there. We actually switched over to NGINX, which is a highly opthamized, a form of HTPD, and that brought the load down to less than .5, which was amazing. I thought it was broken when we first did that, because it was so low and didn't seem to be doing anything. Then we started saturating 100mb connection.

So it took us a while to figure it out some of the tools out there were misbehaving and actually downloading the index file which is by and large the largest file in Central and in the most frequently accessed one we found some locations were downloading 30mb every two minutes because of that broken tool. We worked with those people and got that fixed up, but we were still having regular spikes every Monday when everyone came in and downloaded the new index.

We now host the index off of Amazon S3. That leaves the rest of Central basically running somewhere 10mb per second average as opposed to 99mb per second average. And the S3 stuff is up there in the clouds and it makes even faster downloads for everybody else. Its a win-win in that situation. We were able to turn our attention back to Maven itself instead of trying to focus on what is wrong with Central.

TO: Speaking of Maven, 2.0.10 was released. I've looked at the release -- it took a number of months to get from 2.0.9 to 2.0.10. What exactly was in this latest release?

BF: Unfortunately it was 10 months, believe it or not. I didn't notice that until I went to release/upgrade the website. Once it upgrade the website, I noticed 209. There was a lot of work done this summer and we went through some release candidate process. We had somewhere around 10-15 release candidates. We decided that some of the changes that we made to fix bugs in there we felt they were important, they made things more deterministic in the build pattern and fourth life cycles. We felt that they may be a little risky to introduce into the 2.0 stream because we were really focused on stabilizing that in the 2.0.8, 2.0.9, 2.0.10 releases.

We decided to make a new branch and that became the new 2.1.0 milestone 1 release. It took a little bit of time to sort that out and we ported the bug fixes back into the 2.0 branch and pulled the features out of there. That took a little bit of time to get that process going again with 2.0.10. Basically 2.0.10 just has a pile of bug fixes in it and not really any interesting features. But that's the point that we can make this thing more stable as we go forward. Hopefully 2.0.10 release will be the last release of the 2.0.x line. There is always the possibility that we may fix a few bugs in regression, but we don't really want to put any energies there anymore.

The 2.1.0 milestone 1 release turned out to be very stable because it went through around 2.0+ release candidates before it finally was released. The problem is in talking to customers, we found out that many people were either afraid to or just simply not allowed to use that milestone release just because it had an M1 at the end of it. All of us, Maven developers and other people that have used it, know that's it is very stable, maybe even more so than the 2.0.x timeline. The plan now is to get all the bugs fixed in there we feel are really important. We've pushed all the features out that we originally planned for further milestones since the 2.2 and hopefully we'll start staging release candidates of 2.1.0 within the next couple of days to a week tops, if we can get a 2.1.0 release out and not that lets us stabilize 2.0.10 then hopefully we won't have a need to do a 2.0.1.1.

TO: Talk to somebody who might not be following you very closely. What's the difference between 2.0.10 and 2.1.0?

BF: The 2.1.0 has a new feature in it that Olege had actually coded and it took us a while to remember it and get it integrated. And that's the ability to encrypt your passwords in your settings files. That's actually a pretty commonly requested feature that people don't like putting their password particularly if they are using a repository with their corporate password, they don't like putting that in a text file for obvious reasons. One significant feature. In my eye, in 2.1.0 is the ability to encrypt that password.

The other significant feature is, I guess I wouldn't call it a feature, its basically some bug fixes that I mentioned that we tried to put in 2.0.10 that had to do with fourth life cycles and making sure that properties were correctly interpolated. The major use case that came forward with that was clover users that they needed to instrument the jars and needed to update certain paths. But the way they made it in 2.0.x viewing with that it didn't really work right. That will be fixed in 2.1.0.

TO: The main feature that will be most visible is the password encryption. I think fourth life cycle having had to write about them, they pretty confusing. It's almost like time travel there.

BF: Yeah, I think so. And that's really a bug fix that sort of oscillated back and forth in various versions of 2.0 that we fixed it one way broke other cases, then we put it back and it broke other cases. That's why we decided to bump it to 2.1 and fix it right. I think the security and encryption of the passwords will probably be the headlining feature.

We're also working to get on the patch contributed by Don Brown to have the parallel downloads of artifacts. That can actually improve download performance pretty significantly by my own testing, it even is faster even if you're using a repository manager. That's also going to be in the 2.1.0 release.

TO: That was from Don Brown's Patch. It's about a year old, right.
Didn't he fork Maven and try to do some things on his own?

BF: Yeah he did. He forked it and applied this patch and sent it back
to us. It didn't make 2.0.10 at that time was because we were waiting
on integration tests and other unit tests we never really got. John
is working on it now to try to get some specific tests on that to make
sure it works. The main thing we're adding to that is the ability to
basically turn it off. The original patch was simply on and if we released it and the bugs turned up, there would be nothing you could do about it. I think that's what John is working on now to make the number of parallel downloads configurable and you can put it back down to 1.

TO: It sounds like you made a transition from a project that cavalierly released a maintenance, to one that is thinking about a million user installed base or even more than that, not sure of the numbers.

BF: Yeah, that sort of happened after I started doing the 2.0.x releases. It became apparent to me that we were just introducing half
the bugs on any given version, were regressions from previous versions or ones before. It was a little bit ridiculous and embarrassing and that is when we really started focusing on getting the release candidates and bringing the release candidates out to the user base and not just keeping them inside the development community of Maven. It simply was not enough use cases for us to ensure we didn't break something.

When we first started doing that, we found all kinds of stuff that we
had gone through many iterations internally and everybody thought it
was fine. As soon as we went to the user community we found out quickly it wasn't so fine. I think that the stability of 2.0.9 has shown us that process actually worth maintaining because its been almost 10 months since the 2.0.9 release, but its just basically works. Hopefully the 2.0.10 release will build upon that and continue to be stable going forward. We'll do the same with 2.1.0 and with 3.0.x that Jason is releasing.

The goal is every 2 week for releases. Its been more like every month. We're almost ready for the Alpha 3 release. Once we get through some number of alphas', we should hopefully have a stable
product before we call it final.

TO: Just a brief list, give 4-5 things we can expect in the 3.0 trunk.

BF: Is it possible to sum up 3.0 in this short list?

TO: I mean it seems like probably at least a few months off -- its not
a year, what is the time frame, what is the plan?

BF: I think its a couple of months before we start having public betas. I think maybe 6 months before we can realistically think it's final. But all the pieces are there and so we're just sort of working through them as we go forward.

The major change -- there are several in the 3.0 line. One of them is
that its basically set up for embedding now. In the 2.0.x line, there was an embedder that basically stopped at 2.0.4. This is actually what things like Hudson used to embed their maven functionality. The
problem with that was it was sort of added to Maven after the fact.
Maven 30 basically, the command line client is a client of the embedder. The core of Maven is an embeddable component and a command line wrapper around it. This is what's net beams, eclipse they use to get their Maven functionality. That's a pretty significant change to focus on making it embeddable.

The dependency resolution has been completely redone using the new mercury stuff which allows parallel downloads and it uses the SAT4J solver which I think came from OSGI land and Eclipse. The goal there is to be able to have more deterministic dependency resolution including good range support. Ranges don't really work so well on the Maven 20 branches.

TO: I know. I think a blog was done some time ago about that. It was
a big picture and all it did was sort of made me remember some very
difficult math classes in college. What does the SAT4J thing actually do?

BF: Its trying to solve differential equations, I guess. There's many
different possible combinations that can result in a solution when
you're talking about ranges and orders of dependencies and conflicts
and things like that. The SAT4J attempts to resolve that down to a single answer all the time. The same answer is really the goal.

TO: Its just trying to make the process deterministic so that nobody
runs into weird problems where the order of dependencies was the
actual problem of the solution is to the versions?

BF: That's right. Its a complicated problem space to deal with. It also has the ability to download things in parallel just like the patch that we have in Maven 2.1, but basically the way mercury works it is able to go out and figure out all the things it needs to download. It makes the decisions about the artifacts it needs before it starts downloading them. It's then able to hand off to a mercury client that has a lot of J coded into it that goes out and downloads all the different things from the different repositories all in one shot. The old code would download things one at a time and make decisions on the fly. Sometimes it downloaded POMs in jars that it ultimately did not use. It was inefficient.

The POM inheritance and interpolation module, the project builder we call it internally, has been completed rewritten by Shane. The focus on that was to first of all fix a lot of the problems that we had,
figure out the rules because it was never well documented how the
inheritance and the interpolation was supposed to work. That's more
the rule base approach and its also set up to allow the injection of
the model from any type of data format. Not just upon that X amount
but for things like N maven, for example, to be able to leverage the
core maven functionality even though they may not have a POM. That would also be used by Tyco which allows eclipse applications builds within Maven from the eclipse metadata and not a POM. That's been completely rewritten.

Off the top of my head those are the three major changes to Maven 3.

TO: The changes are big. . . sort of talking about the future of Maven. There was an old Star Trek show once where there was a problem with the warp drive and they ended up in the far reaches of the universe with totally new forms of matter -- that makes me think about that -- what's down the road in the Maven Project without a project pbject model.

BF: Yeah, in theory you should be able to write the project metadata into Maven and it will just work.

TO: Also just last night I saw Charles Nutter, the creator of Jruby was twittering about some sort of Ruby Maven Bridge he made and Jason
chimed in and said we should make that a part of Nexus. Lots of good interesting stuff happening.

Thanks for taking the time to talk to me and we'll check in in another
few months.

Written by Tim OBrien

Tim is a Software Architect with experience in all aspects of software development from project inception to developing scaleable production architectures for large-scale systems during critical, high-risk events such as Black Friday. He has helped many organizations ranging from small startups to ...

Explore All Posts by Tim OBrien

Interview transcript: New Nexus features + future of Maven

Transcript

Try Nexus Repository Free Today

Subscribe for all the latest software security news and events

Interview transcript: New Nexus features + future of Maven

Transcript

Try Nexus Repository Free Today

Related Resources

Securing and scaling InnerSource with automation

Insecure LLM output handling and how to build safe defenses

How Sonatype leads in AI component analysis for supply chain security

Subscribe for all the latest software security news and events