Text file
talks/2013/oscon-dl.slide
1 dl.google.com: Powered by Go
2 10:00 26 Jul 2013
3 Tags: download, oscon, port, c++, google, groupcache, caching
4
5 Brad Fitzpatrick
6 Gopher, Google
7 @bradfitz
8 bradfitz@golang.org
9 http://bradfitz.com/
10 https://go.dev/
11 https://github.com/golang/groupcache/
12
13 * Overview / tl;dw:
14
15 - dl.google.com serves Google downloads
16 - Was written in C++
17 - Now in Go
18 - Now much better
19 - Extensive, idiomatic use of Go's standard library
20 - ... which is all open source
21 - composition of interfaces is fun
22 - _groupcache_, now Open Source, handles group-aware caching and cache-filling
23
24 * too long...
25
26 * me
27
28 - Brad Fitzpatrick
29 - bradfitz.com
30 - @bradfitz
31 - past: LiveJournal, memcached, OpenID, Perl stuff...
32 - nowadays: Go, Go, Camlistore, Go, anything & everything written in Go ...
33
34 * I love Go
35
36 - this isn't a talk about Go, sorry.
37 - but check it out.
38 - simple, powerful, fast, liberating, refreshing
39 - great mix of low- and high- level
40 - light on the page
41 - static binaries, easy to deploy
42 - not perfect, but my favorite language yet
43
44 * dl.google.com
45
46 * dl.google.com
47
48 - HTTP download server
49 - serves Chrome, Android SDK, Earth, much more
50 - Some huge, some tiny (e.g. WebGL white/blacklist JSON)
51 - behind an edge cache; still high traffic
52 - lots of datacenters, lots of bandwidth
53
54 * Why port?
55
56 * reason 0
57
58 $ apt-get update
59
60 .image oscon-dl/slow.png
61
62 - embarrassing
63 - Google can't serve a 1,238 byte file?
64 - Hanging?
65 - 207 B/s?!
66
67 * Yeah, embarrassing, for years...
68
69 .image oscon-dl/crbug.png
70
71 * ... which led to:
72
73 - complaining on corp G+. Me: "We suck. This sucks."
74 - primary SRE owning it: "Yup, it sucks. And is unmaintained."
75 - "I'll rewrite it for you!"
76 - "Hah."
77 - "No, serious. That's kinda our job. But I get to do it in Go."
78 - (Go team's loan-out-a-Gopher program...)
79
80 * How hard can this be?
81
82 * dl.google.com: few tricks
83
84 each "payload" (~URL) described by a protobuf:
85
86 - paths/patterns for its URL(s)
87 - go-live reveal date
88 - ACLs (geo, network, user, user type, ...)
89 - dynamic zip files
90 - custom HTTP headers
91 - custom caching
92
93 * dl.google.com: how it was
94
95 .image oscon-dl/before.png
96
97 * Aside: Why good code goes bad
98
99 * Why good code goes bad
100
101 - Premise: people don't suck
102 - Premise: code was once beautiful
103 - code tends towards complexity (gets worse)
104 - environment changes
105 - scale changes
106
107 * code complexity
108
109 - without regular love, code grows warts over time
110 - localized fixes and additions are easy & quick, but globally crappy
111 - features, hacks and workarounds added without docs or tests
112 - maintainers come & go,
113 - ... or just go.
114
115 * changing environment
116
117 - Google's infrastructure (hardware & software), like anybody's, is always changing
118 - properties of networks, storage
119 - design assumptions no longer make sense
120 - scale changes (design for 10x growth, rethink at 100x)
121 - new internal services (beta or non-existent then, dependable now)
122 - once-modern home-grown invented wheels might now look archaic
123
124 * so why did it suck?
125
126 .image oscon-dl/slow.png
127
128 - stalling its single-threaded event loop, blocking when it shouldn't
129 - maxed out at one CPU, but couldn't even use a fraction of a single CPU.
130
131 * but why?
132
133 - code was too complicated
134 - future maintainers slowly violated unwritten rules
135 - or knowingly violated them, assuming it couldn't be too bad?
136 - C++ single-threaded event-based callback spaghetti
137 - hard to know when/where code was running, or what "blocking" meant
138
139 * Old code
140
141 - served from local disk
142 - single-threaded event loop
143 - used sendfile(2) "for performance"
144 - tried to be clever and steal the fd from the "SelectServer" sometimes to manually call sendfile
145 - while also trying to do HTTP chunking,
146 - ... and HTTP range requests,
147 - ... and dynamic zip files,
148 - lots of duplicated copy/paste code paths
149 - many wrong/incomplete in different ways
150
151 * Mitigation solution?
152
153 - more complexity!
154 - ad hoc addition of more threads
155 - ... not really defined which threads did what,
156 - ... or what the ownership or locking rules were,
157 - no surprise: random crashes
158
159 * Summary of 5-year old code in 2012
160
161 - incomplete docs, tests
162 - stalling event loop
163 - ad-hoc threads...
164 - ... stalling event loops
165 - ... races
166 - ... crashes
167 - copy/paste code
168 - ... incomplete code
169 - two processes in the container
170 - ... different languages
171
172 * Environment changes
173
174 - Remember: on start, we had to copy all payloads to local disk
175 - in 2007, using local disk wasn't restricted
176 - in 2007, sum(payload size) was much smaller
177 - in 2012, containers get tiny % of local disk spindle time
178 - ... why aren't you using the cluster file systems like everybody else?
179 - ... cluster file systems own disk time on your machine, not you.
180 - in 2007, it started up quickly.
181 - in 2012, it started in 12-24 hours (!!!)
182 - ... hope we don't crash! (oh, whoops)
183
184 * Copying N bytes from A to B in event loop environments (node.js, this C++, etc)
185
186 - Can *A* read?
187 - Read up to _n_ bytes from A.
188 - What'd we get? _rn_
189 - _n_ -= _rn_
190 - Store those.
191 - Note we want to want to write to *B* now.
192 - Can *B* write?
193 - Try to write _rn_ bytes to *B*. Got _wn_.
194 - buffered -= _wn_
195 - while (blah blah blah) { ... blah blah blah ... }
196
197 * Thought that sucked? Try to mix in other state / logic, and then write it in C++.
198
199 *
200
201 .image oscon-dl/cpp-write.png
202
203 *
204
205 .image oscon-dl/cpp-writeerr.png
206
207 *
208
209 .image oscon-dl/cpp-toggle.png
210
211 * Or in JavaScript...
212
213 - [[https://github.com/nodejitsu/node-http-proxy/blob/master/lib/node-http-proxy/http-proxy.js]]
214 - Or Python gevent, Twisted, ...
215 - Or Perl AnyEvent, etc.
216 - Unreadable, discontiguous code.
217
218 * Copying N bytes from A to B in Go:
219
220 .code oscon-dl/copy.go /START OMIT/,/END OMIT/
221
222 - dst is an _io.Writer_ (an interface type)
223 - src is an _io.Reader_ (an interface type)
224 - synchronous (blocks)
225 - Go runtime deals with making blocking efficient
226 - goroutines, epoll, user-space scheduler, ...
227 - easier to reason about
228 - fewer, easier, compatible APIs
229 - concurrency is a _language_ (not _library_) feature
230
231 * Where to start?
232
233 - baby steps, not changing everything at once
234 - only port the `payload_server`, not the `payload_fetcher`
235 - read lots of old design docs
236 - read lots of C++ code
237 - port all command-line flags
238 - serve from local disk
239 - try to run integration tests
240 - while (fail) { debug, port, swear, ...}
241
242 * Notable stages
243
244 - pass integration tests
245 - run in a lightly-loaded datacenter
246 - audit mode
247 - ... mirror traffic to old & new servers; compare responses.
248 - drop all SWIG dependencies on C++ libraries
249 - ... use IP-to-geo lookup service, not static file + library
250
251 * Notable stages
252
253 - fetch blobs directly from blobstore, falling back to local disk on any errors,
254 - relying entirely on blobstore, but `payload_fetcher` still running
255 - disable `payload_fetcher` entirely; fast start-up time.
256
257 * Using Go's Standard Library
258
259 * Using Go's Standard Library
260
261 - dl.google.com mostly just uses the standard library
262
263 * Go's Standard Library
264
265 - net/http
266 - io
267 - [[/pkg/net/http/#ServeContent][http.ServeContent]]
268
269 * Hello World
270
271 .play oscon-dl/server-hello.go
272
273 * File Server
274
275 .play oscon-dl/server-fs.go
276
277 * http.ServeContent
278
279 .image oscon-dl/servecontent.png
280
281 * io.Reader, io.Seeker
282
283 .image oscon-dl/readseeker.png
284 .image oscon-dl/reader.png
285 .image oscon-dl/seeker.png
286
287 * http.ServeContent
288
289 $ curl -H "Range: bytes=5-" http://localhost:8080
290
291 .play oscon-dl/server-content.go
292
293 * groupcache
294
295 * groupcache
296
297 - memcached alternative / replacement
298 - [[http://github.com/golang/groupcache]]
299 - _library_ that is both a client & server
300 - connects to its peers
301 - coordinated cache filling (no thundering herds on miss)
302 - replication of hot items
303
304 * Using groupcache
305
306 Declare who you are and who your peers are.
307
308 .code oscon-dl/groupcache.go /STARTINIT/,/ENDINIT/
309
310 This peer interface is pluggable. (e.g. inside Google it's automatic.)
311
312 * Using groupcache
313
314 Declare a group. (group of keys, shared between group of peers)
315
316 .code oscon-dl/groupcache.go /STARTGROUP/,/ENDGROUP/
317
318 - group name "thumbnail" must be globally unique
319 - 64 MB max per-node memory usage
320 - Sink is an interface with SetString, SetBytes, SetProto
321
322 * Using groupcache
323
324 Request keys
325
326 .code oscon-dl/groupcache.go /STARTUSE/,/ENDUSE/
327
328 - might come from local memory cache
329 - might come from peer's memory cache
330 - might be computed locally
331 - might be computed remotely
332 - of all threads on all machines, only one thumbnail is made, then fanned out in-process and across-network to all waiters
333
334 * dl.google.com and groupcache
335
336 - Keys are "<blobref>-<chunk_offset>"
337 - Chunks are 2MB
338 - Chunks cached from local memory (for self-owned and hot items),
339 - Chunks cached remotely, or
340 - Chunks fetched from Google storage systems
341
342 * dl.google.com interface composition
343
344 .code oscon-dl/sizereaderat.go /START_1/,/END_1/
345
346 * io.SectionReader
347
348 .image oscon-dl/sectionreader.png
349
350 * chunk-aligned ReaderAt
351
352 .code oscon-dl/chunkaligned.go /START_DOC/,/END_DOC/
353
354 - Caller can do ReadAt calls of any size and any offset
355 - `r` only sees ReadAt calls on 2MB offset boundaries, of size 2MB (unless final chunk)
356
357 * Composing all this
358
359 - http.ServeContent wants a ReadSeeker
360 - io.SectionReader(ReaderAt + size) -> ReadSeeker
361 - Download server payloads are a type "content" with Size and ReadAt, implemented with calls to groupcache.
362 - Wrapped in a chunk-aligned ReaderAt
363 - ... concatenate parts of with MultiReaderAt
364
365 .play oscon-dl/server-compose.go /START/,/END/
366
367 * Things we get for free from net/http
368
369 - Last-Modified
370 - ETag
371 - Range requests (w/ its paranoia)
372 - HTTP/1.1 chunking, etc.
373 - ... old server tried to do all this itself
374 - ... incorrectly
375 - ... incompletely
376 - ... in a dozen different copies
377
378 * Overall simplification
379
380 - deleted C++ payload_server & Python payload_fetcher
381 - 39 files (14,032 lines) deleted
382 - one binary now (just Go `payload_server`, no `payload_fetcher`)
383 - starts immediately, no huge start-up delay
384 - server is just "business logic" now, not HTTP logic
385
386 * From this...
387
388 .image oscon-dl/before.png
389
390 * ... to this.
391
392 .image oscon-dl/after.png
393
394 * And from page and pages of this...
395
396 .image oscon-dl/cpp-writeerr.png
397
398 * ... to this
399
400 .image oscon-dl/after-code.png
401
402 * So how does it compare to C++?
403
404 - less than half the code
405 - more testable, tests
406 - same CPU usage for same bandwidth
407 - ... but can do much more bandwidth
408 - ... and more than one CPU
409 - less memory (!)
410 - no disk
411 - starts up instantly (not 24 hours)
412 - doesn't crash
413 - handles hot download spikes
414
415 * Could we have just rewritten it in new C++?
416
417 - Sure.
418 - But why?
419
420 * Could I have just fixed the bugs in the C++ version?
421
422 - Sure, if I could find them.
423 - Then have to own it ("You touched it last...")
424 - And I already maintain an HTTP server library. Don't want to maintain a bad one too.
425 - It's much more maintainable. (and 3+ other people now do)
426
427 * How much of dl.google.com is closed-source?
428
429 - Very little.
430 - ... ACL policies
431 - ... RPCs to Google storage services.
432 - Most is open source:
433 - ... code.google.com/p/google-api-go-client/storage/v1beta1
434 - ... net/http and rest of Go standard library
435 - ... `groupcache`, now open source ([[https://github.com/golang/groupcache][github.com/golang/groupcache]])
436
437
View as plain text