We judge code quality along two different dimensions.
In the short term, time pressures often tempt us to trade future maintainability for faster initial development. If not quickly corrected, this tradeoff is almost always bad in the long term, as maintaining poor quality code requires huge amounts of time and effort.
Having a Culture of Quality is important for making the necessary investments in high quality code to ensure that we can continue to GSD in the future.
Just as under-investing in quality can lead to less GSD in the future, over-investing in quality can lead to less GSD in the present. Perfection is the enemy of GSD; we must find a balance that allows us to both make progress now as well as not limit us in the future.
Reviews are perhaps the most important method we use to maintain a culture of quality. We generically refer to reviews of both code and designs as "code reviews".
Code reviews have multiple benefits:
Code reviews are a good example of a keystone habit. Knowing that others will be looking at their code encourages developers to invest the time and effort to create high quality code, even before it is reviewed.
In general, all designs and all code should be reviewed, and all reviews should be completed before the code is deployed. Thus, it is important for all reviewers to complete their reviews in a timely manner. A good standard is that no code review should be waiting for your feedback for longer than 2 business days. An even better goal to strive for is "code review zero", where you have successfully completed all of your open code reviews!
In addition to ensuring that the code itself is high quality, we also use code reviews to communicate how we write code to each other. Each developer is assigned a rotating reviewer who should be added to all reviews. This additional reviewer is focusing on learning from the review instead of providing constructive feedback on what may be an unfamiliar system. (Although offering constructive feedback is still encouraged, of course!)
Because of the importance of completing code reviews quickly, developers should incorporate code review time into their daily routine. At the same time, they should not consume so much time as to feel burdensome. If you feel overwhelmed with the number of reviews that you need to complete, please speak up! As teams grow and change, we often need to adjust who is reviewing the code for each project. Conversely, also feel free to add yourself as a default reviewer to any project that you have an interest in following.
Whenever we make modifications to a system, we should be leaving it in a better place than it was before. The easiest time to let quality slip is when making a minor change to an otherwise stable project. Unless we make an effort to continuously improve code whenever we touch it, these small reductions in quality will slowly change well-written software into an unmaintainable mess.
The principle of continuous improvement also lets us think about quality as a process, not a state. We may sometimes need to make compromises on quality in order to satisfy other requirements -- code debt is something that needs to be managed, not avoided entirely.
In practice, our philosophy of continuous improvement takes several forms:
Just like the other rules and processes described in this guide, our best practices and standards for developing software are intended to provide a useful default and describe good habits to cultivate. They will not apply universally -- don't be afraid to break the rules, although this should hopefully not be necessary very often.
One simple way of making code more maintainable is to make it easier to visually parse by both the original author and other members of the team. At the same time, making standards easy to follow is essential to preventing them from being overly burdensome. Thus, our standards are mostly encoded as automatic formatting rules, although we also have a somewhat poorly maintained Scala Style Guide.
We are strong believers in the power of tests to validate the behavior of an application both now and in the future. Tests form a concrete specification for how software should behave. Our code should have both unit tests to verify the behavior of small pieces of code as well as integration tests to ensure that different projects are operating with each other according to our expectations.
One of the key measurements of maintainability is whether another developer can quickly and easily recreate an application without modifying any code at all. If the code is inaccessible, this is clearly impossible. All code should be checked into our source control. Generally speaking, commits should small and oriented around a single logical change. This aids in documenting, reverting, merging, and otherwise understanding what each changeset does.
All production builds should happen on our build server. This ensures that builds complete cleanly and are not relying on any local state on a developer's personal machine. We do make a small exception for builds that will never be used outside of an individual developer's work. (For example, when working on multiple projects simultaneously, it may be necessary to publish an upstream project to a local repository for the downstream project to consume.) However, all builds that reach our shared developer repository and potentially beyond to production should be built on the build server.
All production applications should run on one of our standard deployment environments:
Sticking to a few standard deployment environments reduces the overhead of maintaining a large amount of infrastructure. Our common deployment tools also ensure that we can track failures, roll back changes when necessary, and successfully re-deploy services on demand.
We believe that having healthy security restrictions not only protects our systems from others but also protects them from ourselves. All developers have the responsibility to help maintain our production environment, including shutting down and deleting resources when they are no longer needed. However, developers should only be able to make destructive changes intentionally. Ideally, no developer should be able to accidentally shut down production services or delete production data.
To protect against accidental modifications to production systems, destructive changes to production systems should be checked into our SCM as scripts and reviewed according to our usual process. The changes should not actually be made until after the review has been completed. You can find a more detailed explanation of the steps involved on the wiki.
Our permissions are generally controlled via Amazon's Identity and Access Management.