The Cloud Is Running Toward BSD-Style Licenses. Are You?

By

5 minute read time

The New York Times had a great article this weekend that explored the disconnect in the industry. In "Power, Pollution and the Internet," James Glanz writes: "[the] foundation of the information industry is sharply at odds with its image of sleek efficiency and environmental friendliness." This article is interesting in that it calls out the industry for creating an unsustainable power drain based on some awful environmental choices. From the article: "Of all the things the Internet was expected to become, it is safe to say that a seed for the proliferation of backup diesel generators was not one of them."

This piece made me stop and think about trends over the last decade. While the New York Times is focused on environmental costs, I'm more interested in how this shift to Infrastructure-as-a-Service and deployment on cloud-based infrastructure affects open source licenses. The trend might not be readily apparent if you don't know what to pay attention to. Here's an attempt to make sense of licensing trends...

Note: This article explores a trend toward BSD-style licenses. If you are interested in tracking your own application's exposure to various OSS licenses, please take a look at Sonatype Insight. Using Insight, you can keep track of your application's exposure to GPL, AGPL, and other licenses which may present problems when you have to worry about external or internal distribution. Make licensing a part of your Application Lifecycle with Insight.

Taking Databases As an Example

Done any serious web development over the past decade? You've likely encountered MySQL. MySQL's popularity exploded as the industry was looking for a capable, general purpose database that could provide an alternative to Oracle. Oracle is prohibitively expensive for a large portion of the market, and if you are running a cash-strapped startup, you likely won't be eager to fork over the minimum six-figure price of entry you'll need to run Oracle.

For a decade, MySQL competed on both cost and capability. You can certainly scale it if you either know what you are doing or are comfortable spending money on Percona's professional services. It has some scalability issues, but, for the most part, you can either shard or start offloading some of your data to NoSQL once you reach limits. MySQL was the capable database for the 00s, and MySQL rose to popularity over the last decade before people started moving to hosted infrastructure (now what people tend to call cloud infrastructure).

Enter Postgresql (and the Cloud)

Well, something happened one or two years ago: Many large, high-profile web sites moved to Postgresql. Now Postgresql has always had a reputation for being a database with strong opinion. Database administrators, performance nuts, people focused on scalability have always gravitated toward Postgresql. Postgresql community is somewhat "conservative" and there are a small group of core committers that favor stability over creativity. MySQL, on the other hand, has always had a reputation for being something of a mess. Reliable colleagues tell me that MySQL codebase is full of shipwrecks and broken dreams. If you've ever had to deal with some of the more finnicky parts of MySQL tuning, you'll understand that while there may be science to MySQL tuning, it is well hidden underneath a deep layer of poor documentation and guesswork.

The commonly accepted reason for the shift to Postgresql was performance and scalability. While I don't disagree, Postgresql is certainly easier to tune and scale than MySQL. I question this justification as political rather than practical. This is simply the justification you'd expect a technical audience to resonate with, but I don't think it is the real reason for the shift. Here's why?

Cloud-Based Infrastructures Seek BSD-Licenses

I was at a Postgresql event last week in Chicago, it was interesting. Postgresql is experiencing a renewal of interest. More and more people are coming to the database, and I was interested in why. It's not like I've seen several compelling pieces outlining reasons to stop, drop, and move to postgresql immediately. Instead, it seems like a slow shift that has happened over multiple years. While MySQL was a default for startup developers in 2007 and 2008, Postgresql is that default now.

I asked around and got the following guesses:

  • People have realized MySQL's Limitations - I don't buy this one. First, I think MySQL poses some tricky scalability issues, but I don't think most users create systems large enough to experience them. I don't know anyone other than one or two individuals who has had a MySQL scalability issue, they haven't been able to either fix or workaround given the resources.

  • Oracle - I heard a lot of conspiracy theory about Oracle and MySQL. Lots of people put this out as a reason why there is a huge shift to Postgresql. I don't buy it. Oracle is out there chasing after huge contracts. I don't think the Oracle people lose a bit of sleep over MySQL, and (beyond some structural changes to the OSS project) I don't think they are taking it away.

  • Avoiding NoSQL - This was a RDBMs conference, so I took this with a grain of salt. Many people mentioned that Postgresql reduced the need to bring in technologies like MongoDB or Hadoop. I don't buy that, I think that was just wishful thinking from a DBA that doesn't want to integrate with NoSQL. I've also never spoken to anyone who said, "We're on Postgresql so we don't need to use Hadoop." It just has never happened, and I just don't see them as in the same class.

  • A cloud-friendly license - Now this I buy. This explains the trend. I think it would be over-simplistic to say that Heroku is behind a shift to Postgresql (but I do think it is a contributing factor). Companies that offer on-demand, PaaS-style services have an incentive to standardize on BSD-style licenses (like the one that covers Postgresql) because they are distributing software.

It's the Licensing, Stupid

If you look at the language of the GPL, and especially some of the purposeful FUD that pre-acquisition MySQL AB was throwing around, "distribution" of any kind was enough to cover your entire codebase under the GPL. I remember looking at the MySQL AB website in 2004 and wondering if it was possible to make the explanation of the GPL license for MySQL any more confusing. At the time, the common wisdom was that MySQL was crafting the licensing explanation so that it could give companies with any doubt the incentive to purchase (even if it stretched the definition of the GPL).

And, here's the issue: I don't want to single out Oracle, I think they are a fine company, so don't get me wrong. But, I think people are leery of distributing GPL projects with a single, strong copyright holder within the cloud on behalf of paying customers. Even though the license isn't as toxic as the AGPL, it is still unclear what constitutes distribution. And here's the central trend that I think we can call out. As more and more of us rely on third-party software (like Heroku) to download, distribute, and install software, these platforms are increasingly running toward licenses that don't entangle them with a web of obligations.

Or, to summarize, no one likes distributing the GPL, even in the cloud, especially when a big corporation owns the copyright with an interest in license compliance.

So the next time someone tells you they moved to Postgresql, because it is faster and more scalable. Ask yourself whether this is the real reason for the switch, or if that person is just being caught up in a larger movement away from copy-left licenses for cloud-based, PaaS systems. Was it an original idea, or were they affected by early adopters of PaaS moving to Postgresql, because that's the only option provided?

Clarification: I can already see people bombarding me with this question: what about Linux, that's GPL? My answer is nuanced: "I think people are leery of distributing GPL projects with a single, strong copyright holder within the cloud on behalf of paying customers." The Debian project or the CentOS project will not go after you for internal distribution.

Picture of Tim OBrien

Written by Tim OBrien

Tim is a Software Architect with experience in all aspects of software development from project inception to developing scaleable production architectures for large-scale systems during critical, high-risk events such as Black Friday. He has helped many organizations ranging from small startups to ...

Tags