Codebase Refactoring (with help from Go) Russ Cox rsc@golang.org * Abstract Go should add the ability to create alternate equivalent names for types, in order to enable gradual code repair during codebase refactoring. This article explains the need for that ability and the implications of not having it for today’s large Go codebases. This article also examines some potential solutions, including the alias feature proposed during the development of (but not included in) Go 1.8. However, this article is _not_ a proposal of any specific solution. Instead, it is intended as the start of a discussion by the Go community about what solution should be included in Go 1.9. This article is an extended version of a talk given at GothamGo in New York on November 18, 2016. * Introduction Go’s goal is to make it easy to build software that scales. There are two kinds of scale that we care about. One kind of scale is the size of the systems that you can build with Go, meaning how easy it is to use large numbers of computers, process large amounts of data, and so on. That’s an important focus for Go but not for this article. Instead, this article focuses on another kind of scale, the size of Go programs, meaning how easy it is to work in large codebases with large numbers of engineers making large numbers of changes independently. One such codebase is [[http://m.cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/pdf][Google’s single repository]] that nearly all engineers work in on a daily basis. As of January 2015, that repository was seeing 40,000 commits per day across 9 million source files and 2 billion lines of code. Of course, there is more in the repository than just Go code. Another large codebase is the set of all the open source Go code that people have made available on GitHub and other code hosting sites. You might think of this as `go` `get`’s codebase. In contrast to Google’s codebase, `go` `get`’s codebase is completely decentralized, so it’s more difficult to get exact numbers. In November 2016, there were 140,000 packages known to godoc.org and over 160,000 [[https://github.com/search?utf8=%E2%9C%93&q=language%3AGo&type=Repositories&ref=searchresults][GitHub repos written in Go]]. Supporting software development at this scale was in our minds from the very beginning of Go. We paid a lot of attention to implementing imports efficiently. We made sure that it was difficult to import code but forget to use it, to avoid code bloat. We made sure that there weren’t unnecessary dependencies between packages, both to simplify programs and to make it easier to test and refactor them. For more detail about these considerations, see Rob Pike’s 2012 article “[[/talks/2012/splash.article][Go at Google: Language Design in the Service of Software Engineering]].” Over the past few years we’ve come to realize that there’s more that can and should be done to make it easier to refactor whole codebases, especially at the broad package structure level, to help Go scale to ever-larger programs. * Codebase refactoring Most programs start with one package. As you add code, occasionally you recognize a coherent section of code that could stand on its own, so you move that section into its own package. Codebase refactoring is the process of rethinking and revising decisions about both the grouping of code into packages and the relationships between those packages. There are a few reasons you might want to change the way a codebase is organized into packages. The first reason is to split a package into more manageable pieces for users. For example, most users of [[/pkg/regexp/][package regexp]] don’t need access to the regular expression parser, although [[https://pkg.go.dev/github.com/google/codesearch/regexp][advanced uses may]], so the parser is exported in [[/pkg/regexp/syntax][a separate regexp/syntax package]]. The second reason is to [[/blog/package-names][improve naming]]. For example, early versions of Go had an `io.ByteBuffer`, but we decided `bytes.Buffer` was a better name and package bytes a better place for the code. The third reason is to lighten dependencies. For example, we moved `os.EOF` to `io.EOF` so that code not using the operating system can avoid importing the fairly heavyweight [[/pkg/os][package os]]. The fourth reason is to change the dependency graph so that one package can import another. For example, as part of the preparation for Go 1, we looked at the explicit dependencies between packages and how they constrained the APIs. Then we changed the dependency graph to make the APIs better. Before Go 1, the `os.FileInfo` struct contained these fields: type FileInfo struct { Dev uint64 // device number Ino uint64 // inode number ... Atime_ns int64 // access time; ns since epoch Mtime_ns int64 // modified time; ns since epoch Ctime_ns int64 // change time; ns since epoch Name string // name of file } Notice the times `Atime_ns`, `Mtime_ns`, `Ctime_ns` have type int64, an `_ns` suffix, and are commented as “nanoseconds since epoch.” These fields would clearly be nicer using [[/pkg/time/#Time][`time.Time`]], but mistakes in the design of the package structure of the codebase prevented that. To be able to use `time.Time` here, we refactored the codebase. This graph shows eight packages from the standard library before Go 1, with an arrow from P to Q indicating that P imports Q. .html refactor/import1.html Nearly every package has to consider errors, so nearly every package, including package time, imported package os for `os.Error`. To avoid cycles, anything that imports package os cannot itself be used by package os. As a result, operating system APIs could not use `time.Time`. This kind of problem convinced us that `os.Error` and its constructor `os.NewError` were so fundamental that they should be moved out of package os. In the end, we moved `os.Error` into the language as [[/ref/spec/#Errors][`error`]] and `os.NewError` into the new [[/pkg/errors][package errors]] as `errors.New`. After this and other refactoring, the import graph in Go 1 looked like: .html refactor/import2.html Package io and package time had few enough dependencies to be used by package os, and the Go 1 definition of [[/pkg/os/#FileInfo][`os.FileInfo`]] does use `time.Time`. (As a side note, our first idea was to move `os.Error` and `os.NewError` to a new package named error (singular) as `error.Value` and `error.New`. Feedback from Roger Peppe and others in the Go community helped us see that making the error type predefined in the language would allow its use even in low-level contexts like the specification of [[/ref/spec#Run_time_panics][run-time panics]]. Since the type was named `error`, the package became errors (plural) and the constructor `errors.New`. Andrew Gerrand’s 2015 talk “[[/talks/2015/how-go-was-made.slide#37][How Go was Made]]” has more detail.) * Gradual code repair The benefits of a codebase refactoring apply throughout the codebase. Unfortunately, so do the costs: often a large number of repairs must be made as a result of the refactoring. As codebases grow, it becomes infeasible to do all the repairs at one time. The repairs must be done gradually, and the programming language must make that possible. As a simple example, when we moved `io.ByteBuffer` to `bytes.Buffer` in 2009, the [[https://go.googlesource.com/go/+/d3a412a5abf1ee8815b2e70a18ee092154af7672][initial commit]] moved two files, adjusted three makefiles, and repaired 43 other Go source files. The repairs outweighed the actual API change by a factor of twenty, and the entire codebase was only 250 files. As codebases grow, so does the repair multiplier. Similar changes in large Go codebases, such as Docker, and Juju, and Kubernetes, can have repair multipliers ranging from 10X to 100X. Inside Google we’ve seen repair multipliers well over 1000X. The conventional wisdom is that when making a codebase-wide API change, the API change and the associated code repairs should be committed together in one big commit: .html refactor/atomic.html The argument in favor of this approach, which we will call “atomic code repair,” is that it is conceptually simple: by updating the API and the code repairs in the same commit, the codebase transitions in one step from the old API to the new API, without ever breaking the codebase. The atomic step avoids the need to plan for a transition during which both old and new API must coexist. In large codebases, however, the conceptual simplicity is quickly outweighed by a practical complexity: the one big commit can be very big. Big commits are hard to prepare, hard to review, and are fundamentally racing against other work in the tree. It’s easy to start doing a conversion, prepare your one big commit, finally get it submitted, and only then find out that another developer added a use of the old API while you were working. There were no merge conflicts, so you missed that use, and despite all your effort the one big commit broke the codebase. As codebases get larger, atomic code repairs become more difficult and more likely to break the codebase inadvertently. In our experience, an approach that scales better is to plan for a transition period during which the code repair proceeds gradually, across as many commits as needed: .html refactor/gradual.html Typically this means the overall process runs in three stages. First, introduce the new API. The old and new API must be _interchangeable_, meaning that it must be possible to convert individual uses from the old to the new API without changing the overall behavior of the program, and uses of the old and new APIs must be able to coexist in a single program. Second, across as many commits as you need, convert all the uses of the old API to the new API. Third, remove the old API. “Gradual code repair” is usually more work than the atomic code repair, but the work itself is easier: you don’t have to get everything right in one try. Also, the individual commits are much smaller, making them easier to review and submit and, if needed, roll back. Maybe most important of all, a gradual code repair works in situations when one big commit would be impossible, for example when code that needs repairs is spread across multiple repositories. The `bytes.Buffer` change looks like an atomic code repair, but it wasn’t. Even though the commit updated 43 source files, the commit message says, “left io.ByteBuffer stub around for now, for protocol compiler.” That stub was in a new file named `io/xxx.go` that read: // This file defines the type io.ByteBuffer // so that the protocol compiler's output // still works. Once the protocol compiler // gets fixed, this goes away. package io import "bytes" type ByteBuffer struct { bytes.Buffer; } Back then, just like today, Go was developed in a separate source repository from the rest of Google’s source code. The protocol compiler in Google’s main repository was responsible for generating Go source files from protocol buffer definitions; the generated code used `io.ByteBuffer`. This stub was enough to keep the generated code working until the protocol compiler could be updated. Then [[https://go.googlesource.com/go/+/832e72beff62e4fe4897699e9b40a2b228e8503b][a later commit]] removed `xxx.go`. Even though there were many fixes included in the original commit, this change was still a gradual code repair, not an atomic one, because the old API was only removed in a separate stage after the existing code was converted. In this specific case the gradual repair did succeed, but the old and new API were not completely interchangeable: if there had been a function taking an `*io.ByteBuffer` argument and code calling that function with an `*io.ByteBuffer`, those two pieces of code could not have been updated independently: code that passed an `*io.ByteBuffer` to a function expecting a `*bytes.Buffer`, or vice versa, would not compile. Again, a gradual code repair consists of three stages: .html refactor/template.html These stages apply to a gradual code repair for any API change. In the specific case of codebase refactoring—moving an API from one package to another, changing its full name in the process—making the old and new API interchangeable means making the old and new names interchangeable, so that code using the old name has exactly the same behavior as if it used the new name. Let’s look at examples of how Go makes that possible (or not). ** Constants Let’s start with a simple example of moving a constant. Package io defines the [[/pkg/io/#Seeker][Seeker interface]], but the named constants that developers prefer to use when invoking the `Seek` method came from package os. Go 1.7 moved the constants to package io and gave them more idiomatic names; for example, `os.SEEK_SET` is now available as `io.SeekStart`. For a constant, one name is interchangeable with another when the definitions use the same type and value: package io const SeekStart int = 0 package os const SEEK_SET int = 0 Due to [[/doc/go1compat][Go 1 compatibility]], we’re blocked in stage 2 of this gradual code change. We can’t delete the old constants, but making the new ones available in package io allows developers to avoid importing package os in code that does not actually depend on operating system functionality. This is also an example of a gradual code repair being done across many repositories. Go 1.7 introduced the new API, and now it’s up to everyone with Go code to update their code as they see fit. There’s no rush, no forced breakage of existing code. ** Functions Now let’s look at moving a function from one package to another. As mentioned above, in 2011 we replaced `os.Error` with the predefined type `error` and moved the constructor `os.NewError` to a new package as [[/pkg/errors/#New][`errors.New`]]. For a function, one name is interchangeable with another when the definitions use the same signature and implementation. In this case, we can define the old function as a wrapper calling the new function: package errors func New(msg string) error { ... } package os func NewError(msg string) os.Error { return errors.New(msg) } Since Go does not allow comparing functions for equality, there is no way to tell these two functions apart. The old and new API are interchangeable, so we can proceed to stages 2 and 3. (We are ignoring a small detail here: the original `os.NewError` returned an `os.Error`, not an `error`, and two functions with different signatures _are_ distinguishable. To really make these functions indistinguishable, we would also need to make `os.Error` and `error` indistinguishable. We will return to that detail in the discussion of types below.) ** Variables Now let’s look at moving a variable from one package to another. We are discussing exported package-level API, so the variable in question must be an exported global variable. Such variables are almost always set at init time and then only intended to be read from, never written again, to avoid races between reading and writing goroutines. For exported global variables that follow this pattern, one name is nearly interchangeable with another when the two have the same type and value. The simplest way to arrange that is to initialize one from the other: package io var EOF = ... package os var EOF = io.EOF In this example, io.EOF and os.EOF are the same value. The variable values are completely interchangeable. There is one small problem. Although the variable values are interchangeable, the variable addresses are not. In this example, `&io.EOF` and `&os.EOF` are different pointers. However, it is rare to export a read-only variable from a package and expect clients to take its address: it would be better for clients if the package exported a variable set to the address instead, and then the pattern works. ** Types Finally let’s look at moving a type from one package to another. This is much harder to do in Go today, as the following three examples demonstrate. *** Go’s os.Error Consider once more the conversion from `os.Error` to `error`. There’s no way in Go to make two names of types interchangeable. The closest we can come in Go is to give `os.Error` and `error` the same underlying definition: package os type Error error Even with this definition, and even though these are interface types, Go still considers these two types [[/ref/spec#Type_identity][different]], so that a function returning an os.Error is not the same as a function returning an error. Consider the [[/pkg/io/#Reader][`io.Reader`]] interface: package io type Reader interface { Read(b []byte) (n int, err error) } If `io.Reader` is defined using `error`, as above, then a `Read` method returning `os.Error` will not satisfy the interface. If there’s no way to make two names for a type interchangeable, that raises two questions. First, how do we enable a gradual code repair for a moved or renamed type? Second, what did we do for `os.Error` in 2011? To answer the second question, we can look at the source control history. It turns out that to aid the conversion, we [[https://go.googlesource.com/go/+/47f4bf763dcb120d3b005974fec848eefe0858f0][added a temporary hack to the compiler]] to make code written using `os.Error` be interpreted as if it had written `error` instead. *** Kubernetes This problem with moving types is not limited to fundamental changes like `os.Error`, nor is it limited to the Go repository. Here’s a change from the [[https://kubernetes.io/][Kubernetes project]]. Kubernetes has a package util, and at some point the developers decided to split out that package’s `IntOrString` type into its own [[https://pkg.go.dev/k8s.io/kubernetes/pkg/util/intstr][package intstr]]. Applying the pattern for a gradual code repair, the first stage is to establish a way for the two types to be interchangeable. We can’t do that, because the `IntOrString` type is used in struct fields, and code can’t assign to that field unless the value being assigned has the correct type: package util type IntOrString intstr.IntOrString // Not good enough for: // IngressBackend describes ... type IngressBackend struct { ServiceName string `json:"serviceName"` ServicePort intstr.IntOrString `json:"servicePort"` } If this use were the only problem, then you could imagine writing a getter and setter using the old type and doing a gradual code repair to change all existing code to use the getter and setter, then modifying the field to use the new type and doing a gradual code repair to change all existing code to access the field directly using the new type, then finally deleting the getter and setter that mention the old type. That required two gradual code repairs instead of one, and there are many uses of the type other than this one struct field. In practice, the only option here is an atomic code repair, or else breaking all code using `IntOrString`. *** Docker As another example, here’s a change from the [[https://www.docker.com/][Docker project]]. Docker has a package utils, and at some point the developers decided to split out that package’s `JSONError` type into a separate [[https://pkg.go.dev/github.com/docker/docker/pkg/jsonmessage#JSONError][jsonmessage package]]. Again we have the problem that the old and new types are not interchangeable, but it shows up in a different way, namely [[/ref/spec#Type_assertions][type assertions]]: package utils type JSONError jsonmessage.JSONError // Not good enough for: jsonError, ok := err.(*jsonmessage.JSONError) if !ok { jsonError = &jsonmessage.JSONError{ Message: err.Error(), } } If the error `err` not already a `JSONError`, this code wraps it in one, but during a gradual repair, this code handles `utils.JSONError` and `jsonmessage.JSONError` differently. The two types are not interchangeable. (A [[/ref/spec#Type_switches][type switch]] would expose the same problem.) If this line were the only problem, then you could imagine adding a type assertion for `*utils.JSONError`, then doing a gradual code repair to remove other uses of `utils.JSONError`, and finally removing the additional type guard just before removing the old type. But this line is not the only problem. The type is also used elsewhere in the API and has all the problems of the Kubernetes example. In practice, again the only option here is an atomic code repair or else breaking all code using `JSONError`. * Solutions? We’ve now seen examples of how we can and cannot move constants, functions, variables, and types from one package to another. The patterns for establishing interchangeable old and new API are: const OldAPI = NewPackage.API func OldAPI() { NewPackage.API() } var OldAPI = NewPackage.API type OldAPI ... ??? modify compiler or ... ??? For constants and functions, the setup for a gradual code repair is trivial. For variables, the trivial setup is incomplete but only in ways that are not likely to arise often in practice. For types, there is no way to set up a gradual code repair in essentially any real example. The most common option is to force an atomic code repair, or else to break all code using the moved type and leave clients to fix their code at the next update. In the case of moving os.Error, we resorted to modifying the compiler. None of these options is reasonable. Developers should be able to do refactorings that involve moving a type from one package to another without needing an atomic code repair, without resorting to intermediate code and multiple rounds of repair, without forcing all client packages to update their own code immediately, and without even thinking about modifying the compiler. But how? What should these refactorings look like tomorrow? We don’t know. The goal of this article is to define the problem well enough to discuss the possible answers. ** Aliases As explained above, the fundamental problem with moving types is that while Go provides ways to create an alternate name for a constant or a function or (most of the time) a variable, there is no way to create an alternate name for a type. For Go 1.8 we experimented with introducing first-class support for these alternate names, called [[/design/16339-alias-decls][_aliases_]]. A new declaration syntax, the alias form, would have provided a uniform way to create an alternate name for any kind of identifier: const OldAPI => NewPackage.API func OldAPI => NewPackage.API var OldAPI => NewPackage.API type OldAPI => NewPackage.API Instead of four different mechanisms, the refactoring of package os we considered above would have used a single mechanism: package os const SEEK_SET => io.SeekStart func NewError => errors.New var EOF => io.EOF type Error => error During the Go 1.8 release freeze, we found two small but important unresolved technical details in the alias support (issues [[/issue/17746][17746]] and [[/issue/17784][17784]]), and we decided that it was not possible to resolve them confidently in the time remaining before the Go 1.8 release, so we held aliases back from Go 1.8. ** Versioning An obvious question is whether to rely on versioning and dependency management for code repair, instead of focusing on strategies that enable gradual code repair. Versioning and gradual code repair strategies are complementary. A versioning system’s job is to identify a compatible set of versions of all the packages needed in a program, or else to explain why no such set can be constructed. Gradual code repair creates additional compatible combinations, making it more likely that a versioning system can find a way to build a particular program. Consider again the various updates to Go’s standard library that we discussed above. Suppose that the old API corresponded in a versioning system to standard library version 5.1.3. In the usual atomic code repair approach, the new API would be introduced and the old API removed at the same time, resulting in version 6.0.0; following [[http://semver.org/][semantic versioning]], the major version number is incremented to indicate the incompatibility caused by removing the old API. Now suppose that your larger program depends on two packages, Foo and Bar. Foo still uses the old standard library API. Bar has been updated to use the new standard library API, and there have been important changes since then that your program needs: you can’t use an older version of Bar from before the standard library changes. .html refactor/version1.html There is no compatible set of libraries to build your program: you want the latest version of Bar, which requires standard library 6.0.0, but you also need Foo, which is incompatible with standard library 6.0.0. The best a versioning system can do in this case is report the failure clearly. (If you are sufficiently motivated, you might then resort to updating your own copy of Foo.) In contrast, with better support for gradual code repair, we can add the new, interchangeable API in version 5.2.0, and then remove the old API in version 6.0.0. .html refactor/version2.html The intermediate version 5.2.0 is backwards compatible with 5.1.3, indicated by the shared major version number 5. However, because the change from 5.2.0 to 6.0.0 only removed API, 5.2.0 is also, perhaps surprisingly, backwards compatible with 6.0.0. Assuming that Bar declares its requirements precisely—it is compatible with both 5.2.0 and 6.0.0—a version system can see that both Foo and Bar are compatible with 5.2.0 and use that version of the standard library to build the program. Good support for and adoption of gradual code repair reduces incompatibility, giving versioning systems a better chance to find a way to build your program. ** Type aliases To enable gradual code repair during codebase refactorings, it must be possible to create alternate names for a constant, function, variable, or type. Go already allows introducing alternate names for all constants, all functions, and nearly all variables, but no types. Put another way, the general alias form is never necessary for constants, never necessary for functions, only rarely necessary for variables, but always necessary for types. The relative importance to the specific declarations suggests that perhaps the Go 1.8 aliases were an overgeneralization, and that we should instead focus on a solution limited to types. The obvious solution is type-only aliases, for which no new operator is required. Following [[http://www.freepascal.org/docs-html/ref/refse19.html][Pascal]] (or, if you prefer, [[https://doc.rust-lang.org/book/type-aliases.html][Rust]]), a Go program could introduce a type alias using the assignment operator: type OldAPI = NewPackage.API The idea of limiting aliases to types was [[/issue/16339#issuecomment-233644777][raised during the Go 1.8 alias discussion]], but it seemed worth trying the more general approach, which we did, unsuccessfully. In retrospect, the fact that `=` and `=>` have identical meanings for constants while they have nearly identical but subtly different meanings for variables suggests that the general approach is not worth its complications. In fact, the idea of adding Pascal-style type aliases was [[/issue/16339#issuecomment-233759255][considered in the early design of Go]], but until now we didn’t have a strong use case for them. Type aliases seem like a promising approach to explore, but, at least to me, generalized aliases seemed equally promising before the discussion and experimentation during the Go 1.8 cycle. Rather than prejudge the outcome, the goal of this article is to explain the problem in detail and examine a few possible solutions, to enable a productive discussion and evaluation of ideas for next time. * Challenge Go aims to be ideal for large codebases. In large codebases, it’s important to be able to refactor codebase structure, which means moving APIs between packages and updating client code. In such large refactorings, it’s important to be able to use a gradual transition from the old API to the new API. Go does not support the specific case of gradual code repair when moving types between packages at all. It should. I hope we the Go community can fix this together in Go 1.9. Maybe type aliases are a good starting point. Maybe not. Time will tell. * Acknowledgements Thanks to the many people who helped us [[/issue/16339][think through the design questions]] that got us this far and led to the alias trial during Go 1.8 development. I look forward to the Go community helping us again when we revisit this problem for Go 1.9. If you’d like to contribute, please see [[/issue/18130][issue 18130]].