How do most people find new dependencies... Google.
By Tim OBrien
6 minute read time
Our 2012 developer survey (PDF) asked: "How do you find artifacts for your projects?" This question might mean different things to different developers, but in general what we're trying to find out is what process people use when they are creating a new project and they need to figure out what to put in a pom.xml for a dependency. This is usually something that happens at the beginning of the software lifecycle, and it is a time to update versions or evaluate alternative libraries. Some developers might just be looking for the latest version of Spring or Hibernate, and other developers might have a larger mandate to identify new alternative dependencies.
How do people find artifacts? How do they find the proper GAV coordinates for a dependency?
The overwhelming answer is: "Search the web for artifacts". Another translation: "We just Google it, and figure out what the latest version of a library is." Or, "We just Google it and figure out what the latest greatest thing is and then we try to see what other people are using."
That first, most popular answer, isn't ideal for a few reasons: SEO, the popularity game of search, and the fact that evaluating a hundred of options in open source is really just a big subjective game. There's no objective "score" for projects, there's no relative ranking. Here's an example:
SEO is driving your dependency list?
When you Google something what you are really doing is participating in something of a shell game. Companies have perfected the art of making sure that particular pages show up first or second in response to particular keywords. So, let's say you are looking for a new version of Log4J or a entirely new logging library. The first problem you will face is "what do I search for?", "Java logging library". So, I just did that, and the first result is http://www.java-logging.com/. This appears to be a sponsored site from a company that sells a tool named SmartInspect , and I'm sure this ia great tool, don't get me wrong, but is it what you were looking for?
Is it directly related to what artifact coordinate you want to put in your project's build file? I'd argue that this is a distraction from your original goal. You were looking for Java logging libraries, you found a list, but it's next to an attempt to upsell, and there's no real guidance, there's no community activity that will help you identify where the "tribe" is.
I wanted answers all I have are these lists...
Search for "Java logging library" yields a helpful, but distracting list with an agenda. What if we searched for "Open Source Java logging library"? The first result is from a site called http://java-source.net, and this site is interesting. It's a directory of software projects with a somewhat arbitrary list of categories. The site is advertising driven, but it doesn't appear that the advertisements are driving the content in any way, so it is an incremental improvement over the last query, but there is still a big problem.
LIke the last list, this site has no guidance. I see qflog, log4j, next to commons logging. There is a little description, but no signal as to what libraries are actually being used. Which of these libraries has a million users versus three users? Which of these libraries is covered under GPLv3 or ASL? Which of these libraries has active development within the last three months? Are there security vulnerabilities.
Time to do some research...
Maybe you took an hour to Google more and tried to figure out where the center of development is these days. Now it is time to do some research. Fire up a browser, go look at Github. Github is often the first stop these days, you load up a project you get a sense of activity by the number of watchers and the number of forks + the general level of activity. If the project is at Eclipse or Apache you might go checkout a Git, Subversion, or CVS repository. You'll be looking for mailing list activity, SCM activity. If you've been doing this long enough you'll be scouring the internet for a list of people you've learned to trust (and people you've learned to avoid) when it comes to open source.
After you are convinced of the relative merits of each library, you'll start looking at the API and evaluating the code. You'll start scouring the internet for community evidence that this library is serious. Do people use this logging library? If they do, are there any blog rants about how awful it is in the last year? My own person Google check is a test to see if anyone has grown so frustrated with a library to be driven to cursing on a blog or on Twitter. (It's a valuable test.)
There are some warning signs here, but there's no simple set of rules to help you evaluate options. You learn from experience. Does the underlying open source project have organization "issues"? If it is a small project, is it sustainable? Is it at a reliable forge like Apache or Eclipse? Is there a history of wacky license changes, etc? I could write a blog post every day for the next year going into some of the quirky factors that you have to take into account when evaluating open source, and even when you have experience making these judgements you can still get burned by making a bad decision. For example, four years ago I started using a promising Java web development framework that appeared to be gaining a critical mass. It wasn't, I just wasn't looking deeply enough at the community, if I had dug deeper I wouldn't have found a history of complaints that would have shifted my own opinion.
Talking to Colleagues
The second most popular answer is really where we tend to make our decisions. If you need a new logging library, often the best place to start is with colleagues. We all know developers who like to operate on "bleeding edge" of technologies and these people serve as "canaries in the coal mine". 9 times of 10, I'll be able to ping my network and ask if anyone has used a particular library - if someone you know was burned by a new version of Hibernate, you'll hold off on making the jump.
In the absence of an objective source of data we tend to follow each other as developers. Projects at companies (like startups) that can deal with exposure to risk tend to use newer OSS projects, and companies (like banks) that can afford little exposure to risk learn from the experiences of others. There's a dark network of interactions between developers that define "the center" for a particular category of open source, and querying a LinkedIn list of Java developers is probably more reliable than the process I outlined in this post.
Googling for Dependencies is like Divination
Something has to change. Now that .NET has a central repository in the form of NuGet gallery, this is a problem that needs to be solved because Googling for new dependencies is a gamble. We need some reliable, authoritative source of real data that will help drive these decisions.
In summary, I think we're still at a very early stage of creating a good ecosystem for OSS artifact evaluation. One of the goals we have at Sonatype is to start building the tools that will deliver you real data. Nexus Professional 2.0 is the first step. We think that giving you popularity data, license data, and security data directly in your repository manager is a first step toward a more comprehensive approach to evaluating the quality of the artifacts you depend on.
Written by Tim OBrien
Tim is a Software Architect with experience in all aspects of software development from project inception to developing scaleable production architectures for large-scale systems during critical, high-risk events such as Black Friday. He has helped many organizations ranging from small startups to Fortune 100 companies take a more strategic approach to adopting and evaluating technology and managing the risks associated with change.
Explore All Posts by Tim OBrien