Text file
talks/2014/static-analysis.slide
1 Static analysis tools
2 for Go code comprehension and refactoring
3
4 golang nyc meetup
5 13 Nov 2014
6
7 Alan Donovan
8 Google, New York
9 adonovan@google.com
10
11 * Video
12
13 This talk was presented at the GothamGo Kickoff Meetup in New York City, November 2014.
14
15 .link http://vimeo.com/114736889 Watch the talk on Vimeo
16
17 * Introduction
18
19 Programmers say "writing code", but most of that time is spent _reading_
20
21 Actually: reading code, and making logical deductions
22
23 Machines are good at reading and logic
24
25 *Machines* *should* *be* *helping* *us* *read* *code.*
26
27 And write it! Refactoring can be tedious and error-prone.
28
29
30 * Outline
31
32 - Code comprehension tools: `oracle`, `godoc`
33 - Analysis technology
34 - Refactoring tools: `gorename`, `eg`
35
36
37 * Demo: the Go oracle
38
39 Supports interactive, editor-integrated queries:
40 - where is this thing defined?
41 - what are the methods of this type?
42 - what are the free variables of this selection?
43 - what might this expression point to?
44 - what dynamic types might this interface or `reflect.Value` contain?
45
46
47 * Demo: godoc analysis features
48
49 .link /lib/godoc/analysis/help.html godoc -analysis=type,pointer
50
51 Package view
52 - method set and _implements_ relation for every type
53 - internal call graph of every package
54
55 Source code view
56 - kind, type, definition of every identifier
57 - method set and _implements_ relation for every type
58 - peers of every channel send/receive
59 - callers of every function
60 - callees of every call site (even dynamic)
61
62
63 * How it works
64
65
66
67 * Libraries and tools
68
69 .image static-analysis/tools.svg
70
71 Mostly in `golang.org/x/tools` repo
72
73
74 * go/types: the Go type checker
75
76 Computes mappings from:
77 - identifiers to definitions
78 - constant expressions to values
79 - expressions to types
80
81 Many subtle corners:
82 - method set computation
83 - recursive interfaces
84 - shifts
85
86 Making it correct, fast, and clean was a substantial project
87
88 .link https://pkg.go.dev/golang.org/x/tools/go/types golang.org/x/tools/go/types
89
90 Author: Robert Griesemer
91
92
93
94 * go/ssa: intermediate representation (IR)
95
96 Typed abstract syntax trees (ASTs) are too varied and subtle to analyze directly
97
98 Programs are lowered into Static Single-Assignment form (SSA):
99 - simplifies dataflow analyses since _reaching_ _definitions_ are implicit
100 - invented 1991, now mainstream (gcc, llvm)
101
102 All Go programs can be expressed using only ~30 basic instructions
103
104 Simple, explicit, high-level, high source fidelity
105
106 .link https://pkg.go.dev/golang.org/x/tools/go/ssa golang.org/x/tools/go/ssa
107
108 The llgo project is using go/ssa as a front-end for LLVM
109
110
111
112 * Demo: ssadump
113
114 # Simple fibonacci function
115 # % ssadump -build=F fib.go basic
116 # % ssadump -build=FD fib.go debug info
117 # % ssadump -test -run unicode toy interpreter
118
119
120
121 * go/pointer: pointer analysis
122
123 Pointers complicate reasoning about program behaviour
124 - functions may be called dynamically
125 - a variable may be updated and accessed via different names ("aliases")
126 - runtime types are values too: `interface`, `reflect.Type`
127
128 We use *pointer* *analysis* to answer the question:
129 which variables might this pointer point to?
130
131 .link https://pkg.go.dev/golang.org/x/tools/go/pointer golang.org/x/tools/go/pointer
132
133 # comment on go's appropriateness for this analysis:
134 # (closed program---no dlopen, classloading, no generics, typesafe)
135
136
137 * Pointer analysis
138
139 Let `pts(p)` be the _points-to_ _set_ of pointer p
140 Its elements are abstract variables: globals, locals, unnamed (`new(T)`)
141
142 Overview:
143
144 - analyze each SSA statement in the whole program
145
146 - generate appropriate constraints on the points-to set of each variable
147
148 Statement Constraint
149 y = &x pts(y) ⊇ {x}
150 y = x pts(y) ⊇ pts(x) "inclusion-based"
151 *y = x ∀o ∈ pts(y). pts(o) ⊇ pts(x)
152 y = *x ∀o ∈ pts(x). pts(y) ⊇ pts(o)
153 y = &x->f \
154 y = x.(T) } not shown
155 reflection /
156
157 - solve the constraint system
158
159
160 * Pointer analysis: constraint generation
161
162 All Go operations can be expressed in these constraints
163 Function, map, slice and channel ops all simplify to struct ops
164
165 Treatment of `unsafe.Pointer` conversion is unsound
166 But we don't care! This isn't a compiler
167
168 The constraint system is:
169 - *field-sensitive*: struct fields x.f and x.g have distinct solutions
170 - *flow-insensitive*: the order of statements is ignored
171 - *context-insensitive*: each function is analyzed only once
172 - *whole-program*: Go source is required for all dependencies
173 - *inclusion-based*: i.e. Andersen's analysis
174
175 Optimization: don't make constraints for "uninteresting" types
176 such as types not related to a one-shot Oracle query
177
178 * Pointer analysis: pre-solver optimizations
179
180 Constraint system for 124KLoC program (oracle) has:
181
182 254,000 variables
183 184,000 equations
184
185 Solving phase is in O(|v|³), so pre-solver optimisation is crucial
186
187 We can bring this down to:
188
189 88,000 variables
190 65,000 equations
191
192 * Pointer Equivalence by Hash-Value Numbering (Hardekopf & Lin '07)
193
194 p = &x a = &x if ... {
195 q = p b = a p, q = &x, &y
196 r = q c = b } else {
197 s = r if ... { a = c } p, q = &y, &x
198 }
199
200 .image static-analysis/hvn.svg
201
202 Hash-Value Numbering
203
204 * Pointer analysis: solving
205
206 It's transitive closure of a graph, essentially
207
208 Lots of optimizations, for example:
209
210 _sparse_bit_vectors_, a very compact representation for points-to sets
211
212 .link https://pkg.go.dev/golang.org/x/tools/container/ints golang.org/x/tools/container/ints
213
214 Solver log is >1GB. Debugging is fun.
215
216
217 #* Sparse bit vector
218 #`golang.org/x/tools/container/ints`
219 #
220 #.image sparsebitvector.svg
221 #
222 #Very compact representation of sparse `set<int>`
223 #Doubly-linked list of (offset int, data [256]bit) blocks.
224
225
226 * Call graph
227
228 The *call* *graph* can be read directly from the solution
229
230 The `golang.org/x/tools/cmd/callgraph` tool prints raw call graphs
231
232 Demo (time permitting)
233
234
235 * Refactoring tools
236
237 * gorename: precise, type-safe identifier renaming
238
239 A refactoring tool, usable from...
240
241 - the command line
242
243 % gorename -from bytes.Buffer.Len -to Size
244 Renamed 105 occurrences in 42 files in 33 packages.
245
246 - many editors ([[https://code.google.com/p/go/source/browse/refactor/rename/rename.el?repo=tools&r=511801758bb9a0b83f9bf931342889fbedbc9b48][Emacs]], [[http://quick.as/dgof2p7][Vim]], [[https://github.com/smartystreets/sublime-gorename][Sublime]], [[https://github.com/davidrjenni/agorn][Acme]])
247
248 All renamings are reversible
249
250 Sound: either a conflict is reported, or the refactoring preserves behaviour*
251
252 *except reflection
253
254 .link https://pkg.go.dev/golang.org/x/tools/cmd/gorename golang.org/x/tools/cmd/gorename
255
256 * Demo: gorename
257
258
259 * Example-based refactoring with eg
260
261 A Go port of the Java _Refaster_ tool
262
263 A template specifies the pattern and replacement as Go expressions:
264
265 package P
266
267 import ( "errors"; "fmt" )
268
269 func before(s string) error { return fmt.Errorf("%s", s) }
270 func after(s string) error { return errors.New(s) }
271
272 Parameters are _wildcards_
273
274 % eg -t template.go <package> ...
275
276 .link https://pkg.go.dev/golang.org/x/tools/cmd/eg golang.org/x/tools/cmd/eg
277
278 * Demo: eg
279
280 * Conclusion
281
282 From the outset, Go has been renowned for its tools: `go`, `gofmt`
283
284 We have many building blocks for sophisticated source analysis tools
285
286 What should we build next?
287
View as plain text