Wednesday, December 12, 2007

Beginners guide to OpenJDK contributing

**DEPRECATED**


There is now an official OpenJDK Developer guide which incidentally appeared two days after this article was noticed on the java.net frontpage.
:)

You should go there for up to date information instead. Thanks to Brad Whetmore for useful feedback though, and getting this article noticed by mentioning it on his blog.

**DEPRECATED**




This is an expanded version of a presentation I held at Javaforum Stockholm in December 2007. It is intended as a quick "getting started" guide for those who wish to participate in the OpenJDK project. All the information was already out there, but spread over many different sites and I thought it needed summing up in one place so beginners could see step by step what they need to do to get a patch accepted. A warning - as the OpenJDK project changes quickly the information here may be out of date. Comments, feedback or errata is much appreciated. The article is published under the Creative Commons.

Contents

  • What is OpenJDK
  • Why contribute?
  • Who are in charge?
  • Organization - Groups and Projects
  • Getting the code
  • Preparing for the build
  • How to build
  • Short code overview
  • Contributor agreement
  • Testing your changes
  • Unit/regression tests
  • Committing patches
  • Making a good patch
  • What I would like to see
  • Links

What is OpenJDK

OpenJDK is the project to open source the development of the Java platform and virtual machine. Java has always been free to download and use, and the source has been available under the Java Research Licence. Now it is also available under a Free Software Foundation approved licence - GNU General Public Licence 2 with Classpath exception. This is not a separate project to Sun's "ordinary" Java. It is not a "ok kids, go play with this and don't bother us anymore" abandonware - the engineers at Sun will use the organization and tools described here exactly like external contributors do, and this source will be the base for the release of Java 7 and future releases as well as possible alternative distributions.

Note that the OpenJDK project is only for collaboration around the implementation of the Java platform. The specification is still being decided by the Java Community Process, which you really should join incidentally... free for individual members, you get to vote about once a year about who will be in the expert groups that decide the future of the platform.

Why contribute

Why should you waste valuable time contributing to open source when you don't get paid for it? Here are the reasons I have to do that:
  • Increase your competence as a programmer. Perhaps you have a few hours a week at work dedicated to increasing your competence? Gaining a better understanding of the JVM and seeing how common algorithms are implemented is a pretty good way to do that.
  • You increase the value of your chosen platform. You make it more likely that there are good jobs for you in the future. Considering how much stuff have been written in Java the last 10 years I don't think there will be any lack of jobs for competent Java programmers for as long as I live, but perhaps you want some new development in the future, not just maintenance programming.
  • Make new contacts. When I was at JavaOne this year people at Sun who work in the management/JMX areas recognized my name when I came up and talked to them because I had submitted a couple of patches in that area.
  • It is an excellent merit. You have code that you are legally allowed to show to potential employers. This is not just a little open source project that you and your closest friends use - if you submit patches to Java (or Linux), your code will be used in critical applications by tens - make that HUNDREDS of millions of end users daily.
  • If you have a reoccuring bug that you have had to do an annoying workaround for - here is your chance to get rid of it permanently, not just for yourself but for all programmers around the world.
  • You will contribute to a more free and better society. Ok, I realize we are getting dangerously close to starry-eyed idealim territory here, but knowledge is power in today's world and computers are our primary knowledge-management tool. Therefore I think it is unwise to allow all control of our computers to fall to any singular entity, be it a state or a few large companies.

Who are in charge

Currently there is a "interim governance board" which I believe was selected by Sun, with help from the open source community. They will write a constitution and decide the final form the project will take. The members are:

Organization - Groups and Projects

OpenJDK has two main concepts - Groups and Projects. I think this could be changed in the future by the governance board in theory, but I doubt that will happen. Groups and projects are described on the official OpenJDK site, but here is a brief summary:

Groups

A group is a collection of people with a common interest. There are three levels of participation: 1. Participant 2. Member 3. Moderator. A participant is anyone who has subscribed to the group mailing list. A Member is someone who "has demonstrated a history of significant contributions to a Group, has been granted Membership by that Group, and has signed the SCA" (more on the SCA later). A Moderator looks after the mailing list, counts votes etc.

Groups usually have web pages describing them and mailing lists for discussions, but they DON'T have code repositories. A group can chose to sponsor a project. The initial groups were more or less the engineering teams at Sun - Java2D, Security, Hotspot, etc. Any group Member can suggest new groups. An examples of this is Dalibor Topič's Porters group suggestion which was approved just a few weeks ago.

Projects

Projects exist to create some form of "artifact" (code, documentation). Projects, unlike groups, do not have member lists, they are open to all who want to contribute. Projects often have code repositories, and are often limited in time. A new project is formed if a group decides it wants to sponsor it. An example of this is the JDK7 project proposed by Mark Reinhold.

Getting the code

The Java source code was previously available as a read-only Subversion repository, but the project has now moved toMercurial (hg). Mercurial is a distributed version control system. If you can read the repository, you can create a local clone - indeed, that is what you must do to work with it. Doing experimental forks of the code is therefore trivial, and it is hoped that increased local influence creates happier participants and a decreased risk of permanent "political" forks. If you haven't worked with Mercurial before, the greatest difference to svn/cvs is that you always have a local clone of the repository to work against, and that downloading patches and applying patches are two separate steps (unlike svn/cvs "update" command).

Once you have downloaded it, "hg help" is your first command. There is a plugin in Netbeans (and in Eclipse too I'm sure) but the original program is run from the command line so I recommend you try it out that way first, to get a feel for all that is possible. A short list of useful commands -

  • hg clone ${localpath_or_url} - create a local copy of a repository
  • hg status - how do your work files diff against your local repository?
  • hg commit - save work file changes to local repository
  • hg incoming - what has changed in parent repository (the one you cloned from)
  • hg pull - download remote patches. Again, this step does not apply them.
  • hg update - apply downloaded patches
  • hg push - send your changes to the parent repository (if you have write permission there)
  • hg merge - merge changes, best done with a tool (kdiff3, meld, Netbeans, Eclipse)

All of these commands take lots of different parameters of course, but rather than me repeating it, there is a really good free book available on Mercurial at red-bean.com.

Mercurial extensions

One of the advantages of Mercurial is that it is easy to script extensions using Python. Sun has chosen to use the "forest" extension for OpenJDK, presumably so that subteams can administer themselves more easily. If you have already installed Mercurial, you can download the extension with it, using the following command:

hg clone http://hg.akoha.org/hgforest/ hgforest

Next you should edit your Mercurial configuration file. In your home directory, create a file called .hgrc and add the following lines to it:

#Mercurial configuration
[ui]
username = your_name_and/or_mail_here
[extensions]
forest=/home/your_uid/hgforest/forest.py

The username defined there will identify you as the author in the commit metadata. At the public web repository of the code your mail address will be obfuscated, but your username string is not automatically obfuscated in the patches you submit. If you are sensitive about spam/privacy issues you may want to get a separate mail account. It is not improbable that some spamming scumbag clones the repository one day and combs through it for addresses.

Once the extension is installed and configured, some commands will be replaced with their forest equivalent (fclone, fpush), so to finally download the code you do:

hg fclone http://hg.openjdk.java.net/jdk7/jdk7

Preparing for the build

You should read the README-builds.html page included in the source to begin with, these days it is very informative and helpful. You have to install the usual development libs. What differs OpenJDK from many other projects is that you have to download a bootstrap JDK - I think it is JDK6 now - and also the "binary plugs" which contain proprietary code that Sun doesn't own and did not get permission to release as open source. You do not need the binary plugs if you chose to download the Iced Tea version of the OpenJDK code. The Iced Tea project was created by RedHat and aims to replace all non-open source code so that you can build a completely free JDK from scratch. RedHat has signed the OpenJDK participation agreement I mentioned before, so their fixes will hopefully make it back into the core repository. When that has happened you won't need the binary plugs or a separate download from IcedTea anymore.

I have only built OpenJDK under Linux, but if you are planning to build under Windows there are a few things to consider. First, the compile process is oriented towards a Unix like environment so Windows style file paths, environment variables etc do not work. Therefore you must download and install Cygwin. According to the README-builds file you MUST use Visual Studio .NET 2003 Professional, the 2005 or later versions do not work, at least not out of the box. Luckily, Tim Bell and others are looking at getting the "free-as-in-beer" Visual Studio Express for C++ to work (and in the future hopefully also "free-as-in-freedom" tools).

In theory only tested and perfectly functioning code is in the repository, but in reality people have mistakenly checked in non-compiling/non-working code often enough that it might be an idea to do as Elliotte Rusty Harold suggests - download a source snapshot first and try to get that to build, and then go on and try the latest repository code.

Building

There is a file in the project source code you checked out -

jdk/make/jdk_generic_profile.sh

which sets all environment variables needed for the build. Look through it and edit it if needed, and when everything looks ok try running the following commands:

bash
. jdk/make/jdk_generic_profile.sh
make dev-sanity
make dev


The "dev" version of the build is a bit smaller, if you really really want to build everything, including an installer, replace the last two lines with:
make sanity
make

Once you have built the whole project, you can build just the subprojects you are working on by guild into the corresponding "make" subdirectory and run the "make" command from there, this will save you time. Doing builds of subprojects depends on tools created by the global make, so the first time you must unfortunately build everything.

Code overview

For easier maintenance the source has been split up into separate projects - langtools, jaxp, jaxws projects mainly consist of java code, hotspot is native code, corba is mixed.

The sources for the core Java classes you use daily are usually in
jdk/src/share/classes/java(x)
Some of these classes are just interfaces or abstract classes. The implementations are usually in
jdk/src/share/classes/sun
(These classes in the sun package are sometimes used by applications, but it is strongly adviced that you never do that. It ties your users to Sun's JVM versions, and these classes are not a public API and therefore can change without any warning between JVM releases.)
Operating system/platform specific Java code can be found in
jdk/src/{platform}/classes
JNI created files usually end up in
jdk/src/share/native/
but these are usually just header files. All the fun stuff tend to happen in
hotspot/src/share/vm

TODO - expand this part of the article.

Requesting to participate

A lot of the following information is taken from OpenJDK contribution page. First of all print out the Sun Participation Agreement and read it through carefully. Basically you agree to "dual ownership" of the code, both parties can do whatever they want with it without asking the other part first. More details at the FAQ.

If you decide these terms are acceptable, sign it and mail it back to the provided address, or scan it and fax it to them. If you have signed this for any earlier Sun open source projects (Glassfish, Netbeans, OpenSolaris...) you are already covered and don't have to do it again.

Getting started with a bug

First of all find a suitable bug or Request For Enhancement (RFE) in the bug database. If you know of a bug but can't find it in the database, please start by submitting the bug! Note that even though a bug has been accepted into the database, the submitter's synopsis can be misleading or completely incorrect. If there is an evaluation there you are on firmer ground, but I have known these to be out of date or incorrect also, so don't trust everything you read, you must check the code and the specs and think carefully about how things really work, and how they are intended to work. Sometimes bugs are even fixed but not closed in the database. When you have a good bug, search through the mailing lists and see if you can find any discussion about it - perhaps work on it has already started. If there is no discussion, announce your intention to start. Send a mail to the appropriate mailinglist with the subject "{Bugid: Synopsis}". For instance:
162111: Incorrect Descriptor handling in ModelMBean classes

Body of the mail is usually just something like "My name is... and I'd like to get started on this bug". Wait a few days for comments. Perhaps Sun engineers have already started working on this bug, perhaps they will re-evaluate it now and close it for some reason. In general, changing any public APIs (adding new methods for instance) is VERY DIFFICULT. You have to provide a convincing argument why this is needed, if the bug synopsis doesn't make a convincing case you could try to add some arguments to the body of your "starting intention" mail. Also, changing incorrect behaviour that applications already depend on is pretty much impossible - backwards compatibility has been holy so far. Still, just because Sun won't merge your patch into the core OpenJDK repository doesn't mean that others won't find it useful...

Testing changes locally

Before submitting a patch you probably want to try out your changes. If you have compiled with the default build target and everything worked, you will have a newly compiled JDK located at build/{platform}/. One way is to just set the $PATH and $JAVA_HOME variables there and then try running all your favourite Java applications. On the other hand, perhaps you want to try to isolate your changes so you can test just those classes you changed together with an already installed JDK that you know works fine. You do this with the Xbootclasspath parameter. For instance -
java -Xbootclasspath/p:jarname.jar
The "/p" part of the command is important, it prepends classes. This means that if the JVM finds any of the core java classes in the jar files or directories you specified (jarname.jar in my example above), these class versions will be used instead of the internal ones in your installed JDK. You can specify several jar files or directories as parameters.

Automated tests - Unit/regression testing

If you want your patch accepted you must also provide test classes. Most of you are probably familiar with JUnit/TestNG or similar test frameworks. Sun for a long time used an internal testing framework called JavaTest with the "jtreg" extension to run many of the JDK tests. This framework has now been open sourced. You can define test suits in java files, shellscript files or html files. The tests executed by jtreg can test not only ordinary Java files, but also applets and shellscripts. The Java classes you test do not need to implement any special interface, you can define the test to just run the main method of the class. If the process started by the test returns normally (the main method or shellscript finished) the test passed. If the JVM terminates with an uncaught exception, or the JVM or shellscript returns an errorcode (-1 usually) as it terminates, then the test failed. You can run tests with

jtreg -jdk:${jdk_path} ${testdir_or_testfile_path}

I'd like to write more about this works, but documentation is a bit sparse and when I last submitted my patch it was still JUnit that was asked of external submitters. Hopefully this situation will improve.

It is difficult to give any rules for how much testing you should do... if you are just changing how GUI are drawn or documentation you can't do any tests of course. As a rule of thumb, perhaps you should consider spending at least as much time writing tests as you did fixing the code.

Submitting your patch

Now we are finally getting to the fun part! You submit your patch by sending a mail to the appropriate mailing list, with a subject of the following pattern:

[PATCH] 162111: Incorrect Descriptor handling in MBean classes

The mail should contain the following:
  1. A discussion of the change: A rationale (if the bug synopsis didn't have one good enough). Briefly what you have changed and why. If you had any alternative solutions that you rejected, you may want to write a little about why.
  2. A diff in the "unified" format (-u). Remember to write which version of the source you created the diff against!
  3. Your tests classes.
Within a couple of weeks a Sun engineer (or someone with the same responsibility in a project) should reply and say if the patch will be accepted, or if it needs further work.

Making a good patch

Here are a few rules of thumb to maximize your chances of getting your patch accepted:
  • Follow The Java Code Conventions. Your code comments should describe WHY you do something if that needs explaining, not HOW you do it. Writing how you do it in comments is unnecessary duplication, that should be immediately obvious when looking at the code or something is wrong with it. Clear and descriptive variable names, yadda yadda. You know the rest...
  • Idiomatic Java code that is readable even to junior programmers is preferable. Of course you shouldn't avoid writing smart and concise code, but don't show off for the sake of showing off, abuse "clever" tricks to fit the code into a single cryptic unmaintainable line, and so on.
  • Write lots of good tests.
  • Most important rule last - only change what is necessary, nothing else! This is a mistake I did when working on my first patch... Your patch will be carefully evaluated by core engineering teams, the smaller the diff is, the easier it is for them. Never run "fix imports", "autoindent all code" or similar in your IDE, start cleaning up cryptic variable names elsewhere in the class, etc, even if you think the original code looks like complete crap. The patch will probably be rejected immediately if you do. Stick to fixing your original bug. Do code cleanup as a separate patch later that doesn't change any functionality, if you think this cleanup is absolutely necessary for maintainability.
If the patch was not accepted into OpenJDK trunk by Sun, don't be discouraged. You probably learned a lot while doing it. Other projects or repositories may still be interested in your patch.

Another rule of thumb I like is - try to be nice, if you need to criticise try to make it constructive criticism. In my experience Sun engineers are almost always been polite and helpful, but as the community grows some highly opinionated (and also very skilled) open source contributors may not hesitate to let you know if they think your code is poor in no uncertain terms. If that should happen, see it as an exercise in maturity. Thank them (silently if you prefer) for the what you may have learned from the exchange, and disregard any possible personal attacks. Consider that current/future employers may read the exchange one day. The Internet has a very long memory, and your online reputation matters.

What I would like to see

For the OpenJDK to succced, Sun must continue their work to lower the barriers to entry. But a living and vibrant community is not something that Sun can create top-down with a wave of a magic wand, it is something that all of us who are interested in seeing OpenJDK succeed must try to help create. I'd like to see the following:
  • More participants! We want YOU for OpenJDK! There must be hundreds of thousands of active Java programmers on the planet. If only a few percent contributed a couple of good patches each....
  • More information from Sun employees what they are working with. Previously I have seen new contributors who have had their first patch rejected because a internal rewrite/planned rewrite fixes the bug, and they were disappointed. Luckily this situation has already improved now that Sun's engineers use Mercurial, and the same mailing lists for discussions as everyone else.
  • Cleaning up of old and inaccurate information online. The Starter bug list page for instance should be updated, I think half of the bugs on it are closed by now. I would also like it to be expanded so it covers not just individual bugs but whole areas with a low barrier to entry where beginner contributions would be welcome - something like the Linux Kernel Newbies community. Perhaps an OpenJDK Newbies Group could be created?

Links


On the Open Road - series of articles about OpenJDK. Rustys article beat me to the punch by a week or so and made this article somewhat redundant, but it was good to see that I was on the right track.
Kelly O'Hair has loads of information on his two blogs about Mercurial, and building the JDK.
Ted Neward blogs about building the OpenJDK on Windows.
Volker Simonis has two very in-depth articles about HotSpot development on Linux with Netbeans.

1 comment:

Anonymous said...

Very nice job of taking so many different aspects of the OpenJDK and presenting them in a well thought-out and cohesive manner.

Couple comments.

In theory only tested and perfectly functioning code is in the repository, but in reality people have mistakenly checked in non-compiling/non-working code often enough that it might be an idea

I have a draft blog entry of what it means to be a Sun/OpenJDK gatekeeper, but haven't found the time to finish it. It's not an easy job to describe (or do!), but basically a gatekeeper is the one who does sanity checks of developer code before it hits the MASTER repositories. We maintain a number of subrepositories, and at various intervals, integrate the changes into the master, which is the main code base you mention:

http://hg.openjdk.java.net/jdk7/jdk7

That repository should never be broken. If it does, a gatekeeper has not fully done his/her job. It's pretty rare, but does happen as deadlines draw near. Release Engineering (RE) does do nightly builds, so any breakage is usually discovered no more than 24 hours later. And yes, hell does break loose around here. When the integration schedule is tight, say near a promotion, the promotion could slip as a result, and that's "Not a Good Thing(TM)"

That said, gatekeepers don't have the resources to test every type of build, but obvious dumb stuff should not be leaking through.

So changes you might see in the group repositories like:

http://hg.openjdk.java.net/jdk7/jsn
http://hg.openjdk.java.net/jdk7/tl

are the bleeding edge changes, and are more likely to be broken than the master listed above. Just don't break the gate, please! Any breakage takes me away from my day job.

Follow The Java Code Conventions.

The Java Code Conventions here are somewhat out of date. It's a good reference, but just know they're changing a bit. For example, the tabs vs. spaces issue was finally standardized. There was a lot of teeth gnashing over that one last year. And definitely don't let your IDE automatically reformat everything.

It is difficult to give any rules for how much testing you should do...

In general, you should be running all of the regression tests for that functional area, and probably anything which directly depends on it. For example, any security changes should be tested against:

*/security
javax/crypto
com/sun/crypto
javax/xml/crypto
javax/smartcardio
com/sun/security
com/sun/net
lib/security

Thanks for an interesting article.

Brad

P.S. Maybe I should have taken the time and finished that article instead! ;)