Travel Audience and Sonatype
Repository management on top of Google Cloud.

At travel audience (TA), we design, implement, run and maintain a bunch of software pieces. These software pieces appear in different forms, and adopt different technologies and deployment strategies, depending on their target needs. For instance, we have apps that consume, process and generate millions of messages per second, where minimalism is key to achieve low-latency, high-throughput and high-availability.
These apps are written in compiled languages such as Go and are usually packed as containers so to make it easier to scale (horizontally) the number of instances needed at a certain time, e.g. based on user demand. On the other end of the spectrum, we also have apps that process terabytes of data per day, sometimes in near-realtime, sometimes within a few hours or days span. These apps are usually packed as Python, Java or Scala archives that are to be scheduled as jobs into a batch and/or stream processing engine.
And — shedding a little more light about our platform — most of our workloads run atop Google Cloud Platform (GCP):
- Containers are orchestrated with Kubernetes Engine (GKE), Google fully-managed Kubernetes service.
- Batch and streaming pipelines are provided by Dataproc and Dataflow.
- Data is stored in Cloud Storage, Cloud SQL, BigTable and BigQuery.
- etc.
Given how heterogeneous our apps and deployment environments look like, we quickly realized we would need to be able to store, catalog and distribute our artifacts, that when stitched together form the fabric of TA.
“We narrowed down our trials to Sonatype Nexus and JFrog Artifactory. We decided to go with Nexus because the OSS version seemed to deliver most of what we were looking for.”
ANDRE ROCHA FERREIRA
DevOps Engineer, travel audience
Deploying Sonatype Nexus Repository to Integrate With Google Cloud
At this point, our Ops team had already identified certain limitations we would have to overcome:
- Caching Docker containers in Sonatype Nexus is complex, therefore we decided to drop it for the time being.
- Integration with Google Cloud Identity Access Management (Cloud IAM) is nonexistent, so we would have to design and implement it.
- Integration with Google Cloud Storage for backup storage and retrieval is nonexistent, so we would have to design and implement it.
- There’s no official support for Kubernetes, the container orchestration solution in place at TA, so we would have to work on it.
After a thoughtful design process, this is what we came up with:
- Developer tools, such as Maven and Docker and GKE reach Sonatype Nexus Repository through a Google Cloud Load-Balancer (GCLB) instance.
- The GCLB instance directs incoming traffic to an instance of
nexus-proxy
. nexus-proxy
checks the request headers for user-identity and then for GCP Organization membership using the Cloud IAM APIs.- If the user is authorized,
nexus-proxy
redirects incoming traffic to Nexus. nexus-backup
periodically uploads backups to Cloud Storage.- Sonatype Nexus caches packages from Maven Central and PyPI.
Sonatype Nexus Authentication With Cloud IAM
We knew beforehand that we would need to authenticate Sonatype Nexus against Cloud IAM but we wanted to make this authentication optional so that the tool we would design and implement could be used by others, in simpler scenarios, e.g. no Cloud IAM.
In order to deliver, we have designed and implemented nexus-proxy
.
Interestingly enough, later we would found out this option would make it much easier to fix other issues, unrelated to authentication, e.g. Sonatype Nexus can’t expose Docker private registry with the same set-up used to expose the other artifact repositories which would defeat the decision of using HTTPS for everything. It also replaced Nginx as the reverse-proxy behind GCLB.
Sonatype Nexus Backup
Sonatype Nexus can be configured to back up its internal database on a regular basis. However, this process does not take blob stores into account. Furthermore, the backups are persisted in the local disk alone, meaning if the disk is lost, the backups are lost too.
So, we came up with a tool, nexus-backup
(a container made up of a bunch of scripts), to execute the backup procedure and then upload the result to Cloud Storage.
- Sonatype Nexus’ blobstores are stored in a Google Persistent Disk (PD) accessible by
nexus-backup
. - A file used to trigger backups is also stored in the same PD.
- A task configured in Sonatype Nexus periodically dumps a backup of the Sonatype Nexus database to the same PD.
- Another task, configured in Sonatype Nexus, signals that a backup is occurring by touching the trigger file.
nexus-backup
watches the trigger file and, whenever the trigger file is touched, starts the blobstore backup procedure.nexus-backup
fetches the Sonatype Nexus database dump and blobstore backup files from the PD.nexus-backup
uploads the backup to a pre-configured Cloud Storage bucket.- A recovery procedure may be conducted by copying a backup to a PD attached to Sonatype Nexus. Nexus will pick the backup and restore it automatically upon restart.
Worthy of note, whenever a lock file has been present for more than 12 hours (probably meaning a failed backup), the lock file is removed so that further backups can happen.
Nexus Lifecycle Management and Usage
We rely on Kubernetes (GKE) to deploy and manage the lifecycle of our Nexus set-up. We have open-sourced detailed instructions on how we do it, including disaster-recovery, and how our developers use it.
Also, we are currently working on an Helm chart for it, which should be made available real soon.
This story was provided by the DevOps team at travel audience.