At travel audience (TA), we design, implement, run and maintain a bunch of software pieces. These software pieces appear in different forms, and adopt different technologies and deployment strategies, depending on their target needs. For instance, we have apps that consume, process and generate millions of messages per second, where minimalism is key to achieve low-latency, high-throughput and high-availability.
These apps are written in compiled languages such as Go and are usually packed as containers so to make it easier to scale (horizontally) the number of instances needed at a certain time, e.g. based on user demand. On the other end of the spectrum, we also have apps that process terabytes of data per day, sometimes in near-realtime, sometimes within a few hours or days span. These apps are usually packed as Python, Java or Scala archives that are to be scheduled as jobs into a batch and/or stream processing engine.
And — shedding a little more light about our platform — most of our workloads run atop Google Cloud Platform (GCP):
Given how heterogeneous our apps and deployment environments look like, we quickly realized we would need to be able to store, catalog and distribute our artifacts, that when stitched together form the fabric of TA.
At this point, our Ops team had already identified certain limitations we would have to overcome:
After a thoughtful design process, this is what we came up with:
nexus-proxychecks the request headers for user-identity and then for GCP Organization membership using the Cloud IAM APIs.
nexus-proxyredirects incoming traffic to Nexus.
nexus-backupperiodically uploads backups to Cloud Storage.
We knew beforehand that we would need to authenticate Nexus against Cloud IAM but we wanted to make this authentication optional so that the tool we would design and implement could be used by others, in simpler scenarios, e.g. no Cloud IAM.
Interestingly enough, later we would found out this option would make it much easier to fix other issues, unrelated to authentication, e.g. Nexus can’t expose Docker private registry with the same set-up used to expose the other artifact repositories which would defeat the decision of using HTTPS for everything. It also replaced Nginx as the reverse-proxy behind GCLB.
Nexus can be configured to back up its internal database on a regular basis. However, this process does not take blob stores into account. Furthermore, the backups are persisted in the local disk alone, meaning if the disk is lost, the backups are lost too.
So, we came up with a tool,
nexus-backup (a container made up of a bunch of scripts), to execute the backup procedure and then upload the result to Cloud Storage.
nexus-backupwatches the trigger file and, whenever the trigger file is touched, starts the blobstore backup procedure.
nexus-backupfetches the Nexus database dump and blobstore backup files from the PD.
nexus-backupuploads the backup to a pre-configured Cloud Storage bucket.
Worthy of note, whenever a lock file has been present for more than 12 hours (probably meaning a failed backup), the lock file is removed so that further backups can happen.
We rely on Kubernetes (GKE) to deploy and manage the lifecycle of our Nexus set-up. We have open-sourced detailed instructions on how we do it, including disaster-recovery, and how our developers use it.
Also, we are currently working on an Helm chart for it, which should be made available real soon.
This story was provided by the DevOps team at travel audience.