First, I'm happy to announce that the Nexus book is now as open source as our Maven book. We made a decision about a month ago to free this content and make it available for anyone to view the source or modify the book as they see fit. All of our books are covered by a Creative Commons license, and the source is available from GitHub.
The topic of this blog post is to make more visible the decisions that go into refactoring a simple project into a multimodule project. While the specifics of this project relate to docbook compilation and site publishing, the basic principles of refactoring a multi-module Maven project apply to almost every project you will encounter in Maven.
Background
The Nexus book is a multimodule project, which consists of a parent project and two submodules: content and examples. The content project depends on the examples project, the examples project is configured to spit out a ZIP assembly that contains all the examples, and the content project is configured to generate an HTML version of the book, a PDF version of the book, and the site that you see here.
The lifecycle the content project is very busy, and the project is responsible for generating multiple artifacts. For example, if you look at the pom.xml in the content project, you'll see that:
-
The generate-pdf goal from the Docbkx plugin is bound to compile,
-
The generate-html goal from the Docbkx plugin is bound to compile,
-
A simple templating goal is bound to the Site phase of the Site lifecycle.
So, this content project produces three outputs: an HTML book, a PDF book, and a fully rendered site uploaded to the Sonatype web server. So there's a lot going on within this content project. If I notice a typo in a chapter, and I want to test the entire process, I have to sit through a lengthy rendering process, as the build attempts to walk through the PDF and HTML build for the book. Instead of focus on a particular part of an overall publishing workflow, I'm forced to run the entire process every time I make a small change. The build isn't modular.
A Set of Audacious Goals
The reason why I'm looking at this particular problem is because I'm ready to start moving this book toward a more automated build that can automate some tasks involved in writing this book. I'd like to add the following features to the build for this book:
-
Rendering an Eclipse book.
-
Validating Book IDs and Section Structure in the DocBook XML.
-
Injecting Examples from the examples project directly into code examples in the book.
-
Injecting the Output of Example Projects directly into screen listings in the book.
-
Automatically checking for example listing overflows (lines greater than a certain number of characters)
-
Adding a watermark to a specific build of a book
-
Adding a well-designed cover to the book (pre-pending and appending PDFs onto the generated PDF)
-
Running the PDF through automatic spellchecking as part of the build process
-
Creating plugin documentation tables using information already embedded into a Mojo.
Right, so this list could go on indefinitely. The point here is that I'm trying to find a way to add more automation to the book. Validation tests to ensure that we never produce a PDF with a code example that trails off the edge of the page, or that all of our sections have the appropriate identifiers. While I know that many of this can happen in XML-related technologies, I'm much more interested in using this as a chance to demonstrate the power of Maven as a foundation for complex, custom builds. I want this book and other related books to be a test platform for not just writing a book, but writing a book that uses Maven and the structure of the repository to facilitate the development of complex, example-driven content.
An Overburdened Lifecycle
Given the audacious list of improvements listed in the previous section, what is the best way to add something like pre-render DocBook validation tasks or pre-compile example injection into the current process, given the current project layout? I could rearrange all of the work that is happening in the current content project, and just squeeze more goals into the current content project's lifecycle. This would have the unwanted side-effect of making the content project take even longer to compile and render a book. It would also mean that the content project would generate even more artifacts. Clearly, any solution that is going to form the foundation for a larger, more extensible approach to building this content is going to require more than one lifecycle as the content project's lifecycle is clearly too crowded for any additional goals.
While a cardinal rule of Maven is that one project produces a single artifact, this is a rule often stretched a bit. For example, you can have a single project produce, install, and deploy multiple attached artifacts generated using the assembly plugin. I also consider a site deploy a separate artifact, especially when, as in the book project, the site is the primary artifact to be generated. In other words, the site generated from this build isn't just a supporting, ancillary site that describes some code, it is the primary artifact of the book. Whatever solution emerges from this refactoring should try to isolate the project that creates the site into a separate project.
If you have one lifecycle that is becoming "busy" and if your project is responsible for creating more than one artifact or output, the solution is to start refactoring your modules, so that one module is responsible for producing one artifact, and each module uses the Maven repository as a medium to exchange dependencies.
The Refactored Solution
The refactored solution turns two submodules into six interdependent submodules.
-
nxbook-examples - Responsible for generating the examples ZIP
-
nxbook-content - Contains the DocBook XML
-
nxbook-html, nxbook-pdf, nxbook-eclipse - These modules contain the stylesheets and format-specific media required to render different book formats.
-
nxbook-site - Finally, a site project depends on other modules that create concrete artifacts like PDFs and ZIPs, and add a simple site.
We've created many new modules, but what we've done is create six new lifecycles to use for customizing this build. Things like example injection and specialized XML validation can happen in the nxbook-content module in a lifecycle that captures the creation, testing, and packaging of nothing but content. The tasks that prepend a nice cover on the PDF book and then apply a watermark to the entire PDF document can be hooked into the lifecycle that is related to just the nxbook-pdf project. If you need to debug a problem that has something to do with the PDF rendering, all you need to do is hack away at the nxbook-pdf project.
Each project declares dependencies on the output generated by other nxbook projects. nxbook-examples installs a ZIP file in the repository, which is then used by nxbook-content to populate code samples. nxbook-pdf isn't concerned with example injection of XML validation, it assumes the JAR artifact it gets from the repository contains valid DocBook XML. The end result of this refactor is that we have room to expand the process to encompass the goals set forth earlier. We also have a cleaner, more easily understood build that consists of smaller components with a limited focus and a digestible POM.
Tim is a Software Architect with experience in all aspects of software development from project inception to developing scaleable production architectures for large-scale systems during critical, high-risk events such as Black Friday. He has helped many organizations ranging from small startups to ...
Explore All Posts by Tim OBrienTags
Try Nexus Repository Free Today
Sonatype Nexus Repository is the world’s most trusted artifact repository manager. Experience the difference and download Community Edition for free.