The Design of the Go Assembler
Gophercon
12 July 2016
Rob Pike
Gophercon
12 July 2016
Rob Pike
Video is here.
2Why Learn Assembler Language?
The most important single thing to realize about assembler language is that it enables the programmer to use all System/360 machine functions as if he were coding in System/360 machine language.
— A Programmer's Introduction to IBM System/360 Assembler Language, 1970, page 4
3Once it was all you needed, then high-level languages like FORTRAN and COBOL came along.
But still needed today:
math/big
)Also, perhaps most important: It's how we talk about the machine.
Knowing assembly, even a little, means understanding computers better.
4Some examples...
51 PRINT NOGEN 2 STOCK1 START 0 3 BEGIN BALR 11,0 4 USING *,11 5 MVC NEWOH,OLDOH 6 AP NEWOH,RECPT 7 AP NEWOH,ISSUE 8 EOJ 11 OLDOH DC PL4'9' 12 RECPT DC PL4'4' 13 ISSUE DC PL4'6' 14 NEWOH DS PL4 15 END BEGIN
# TO ENTER A JOB REQUEST REQUIRING NO VAC AREA: COUNT 02/EXEC NOVAC INHINT AD FAKEPRET # LOC(MPAC +6) - LOC(QPRET) TS NEWPRIO # PRIORITY OF NEW JOB + NOVAC C(FIXLOC) EXTEND INDEX Q # Q WILL BE UNDISTURBED THROUGHOUT. DCA 0 # 2CADR OF JOB ENTERED. DXCH NEWLOC CAF EXECBANK XCH FBANK TS EXECTEM1 TCF NOVAC2 # ENTER EXECUTIVE BANK.
TITLE COUNT A=1 ;Define a name for an accumulator. START: MOVSI A,-100 ;initialize loop counter. ;A contains -100,,0 LOOP: HRRZM A,TABLE(A) ;Use right half of A to index. AOBJN A,LOOP ;Add 1 to both halves (-77,,1 -76,,2 etc.) ;Jump if still negative. .VALUE ;Halt program. TABLE: BLOCK 100 ;Assemble space to fill up. END START ;End the assembly.
(From the MIT PDP-10 Info file)
8/ a3 -- pdp-11 assembler pass 1 assem: jsr pc,readop jsr pc,checkeos br ealoop tst ifflg beq 3f cmp r4,$200 blos assem cmpb (r4),$21 /if bne 2f inc ifflg 2: cmpb (r4),$22 /endif bne assem dec ifflg br assem
(From Unix v6 as/as13.s
)
strtolower public link a6,#0 ;Set up stack frame movea 8(a6),a0 ;A0 = src, from stack movea 12(a6),a1 ;A1 = dst, from stack loop move.b (a0)+,d0 ;Load D0 from (src) cmpi #'A',d0 ;If D0 < 'A', blo copy ;skip cmpi #'Z',d0 ;If D0 > 'Z', bhi copy ;skip addi #'a'-'A',d0 ;D0 = lowercase(D0) copy move.b d0,(a1)+ ;Store D0 to (dst) bne loop ;Repeat while D0 <> NUL unlk a6 ;Restore stack frame rts ;Return end
(From Wikipedia)
10ident slice V6 0 ; initialize S A4 S0 ; initialize *x A5 S1 ; initialize *y A3 S2 ; initialize i loop S0 A3 JSZ exit ; if S0 == 0 goto exit VL A3 ; set vector length V11 ,A4,1 ; load slice of x[i], stride 1 V12 ,A5,1 ; load slice of y[i], stride 1 V13 V11 *F V12 ; slice of x[i] * y[i] V6 V6 +F V13 ; partial sum A14 VL ; get vector length of this iteration A4 A4 + A14 ; *x = *x + VL A5 A5 + A14 ; *y = *y + VL A3 A3 - A14 ; i = i - VL J loop exit
(From Robert Griesemer's PhD thesis)
11Columnar layout with function and variable declarations, labels, instructions.
Instructions:
subroutine header label: instruction operand... ; comment ...
Operands:
register literal constant address register indirection (register as address) ...
There are exceptions such as Cray (A5
A5+A14
) but they aren't conceptually different.
CPUs are all pretty much the same.
12We can use the common structure of all assemblers (CPUs, really) to construct a common grammar for all architectures.
This realization took some time.
The seeds were planted long ago.
13
Around 1986, Ken Thompson wrote a C compiler for the National 32000 (Sequent SMP).
Compiler generated pseudo-code, linker did instruction assignment.
The "assembler" was just a way to write that pseudo-code textually.
MOVW $0, var
might become (hypothetical example)
XORW R1, R1 STORE R1, var
Note assembler emits the MOVW
; the linker generates XORW
and STORE
.
We call this instruction selection.
Or consider RET
, which becomes RET
or JMP
LR
or JMP
(R31)
or ...
The assembler is just a way to hand-write the output the compiler produces.
(Compiler does not feed assembler, unlike in many other systems.)
Assembler for each architecture was a separate C program with a Yacc grammar,
adapted and partially rewritten for every architecture.
8a
, 6a
, va
etc. corresponding to 8c
, 6c
vc
, etc.
(One-letter codes: 8
for 386, 6
for AMD64, v
for MIPS, etc.)
All very similar up front but different in detail.
The earliest Go implementations used this design, adding Go compilers 8g
, 6g
but using the Plan 9 assemblers unchanged.
The separation of (compiler/assembler)⇒linker allowed the Go linker to do more, including helping boot the runtime.
16
Goal: Move to a pure Go implementation.
Preparation started in Go 1.3
New library that (in part) does instruction selection: "liblink"
(as of 1.5, "obj"
).
Call it from the compiler.
Thus the first part of the old linker is now in the compiler.
The compiler now emits (mostly) real instructions, not pseudo-instructions.
Result: Slower compiler, but faster build.
Instruction selection for library code done once, not every time you link a program.
Assemblers also use obj
.
For both compiler and assembler, the input is unchanged.
In fact the whole process is the same, just arranged differently.
More prep in Go 1.4, then in Go 1.5, all tooling moved to Go.
Compiler and linker machine-translated from C to Go.
The old liblink
became a new suite of libraries, obj/...
:
cmd/internal/obj
(portable part)cmd/internal/obj/x86
(architecture-specific part)cmd/internal/obj/arm
(architecture-specific part)Previous presentations about this work:
The many compilers (6g
, 8g
etc.) were replaced with a single tool: compile
.
GOOS
and GOARCH
(only!) specify the target operating system and architecture.
GOOS=darwin GOARCH=arm go tool compile prog.go
Same for the linker: 6l
, 8l
, etc. become go
tool
link
.
How can a single binary handle all these architectures?
Only one input language, only one output generator (the obj
library).
The target is configured when the tool starts.
Unlike the old compilers, which shared much code, the old assemblers were all different programs.
(Although they were very similar inside, they shared almost no code.)
Proposal: Write a single go
tool
asm
from scratch in Go, replacing all the old assemblers.
GOOS
and GOARCH
tell you what the target is.
But assembly language isn't Go. Every machine has a different assembly language.
Well, not really! Not quite universal across machines, but ...
22Look at the generated assembly for this simple program:
package add func add(a, b int) int { return a + b }
For each architecture, with some noise edited out:
23TEXT add(SB), $0-12 MOVL a+4(FP), BX ADDL b+8(FP), BX MOVL BX, 12(FP) RET
TEXT add(SB), $0-24 MOVQ b+16(FP), AX MOVQ a+8(FP), CX ADDQ CX, AX MOVQ AX, 24(FP) RET
TEXT add(SB), $-4-12 MOVW a(FP), R0 MOVW b+4(FP), R1 ADD R1, R0 MOVW R0, 8(FP) RET
TEXT add(SB), $-8-24 MOVD a(FP), R0 MOVD b+8(FP), R1 ADD R1, R0 MOVD R0, 16(FP) RET
TEXT add(SB), $0-24 MOVD a(FP), R1 MOVD b+8(FP), R2 ADD R2, R1, R1 MOVD R1, 16(FP) RET
TEXT add(SB), $-8-24 MOVV a(FP), R1 MOVV b+8(FP), R2 ADDVU R2, R1 MOVV R1, 16(FP) RET
TEXT add(SB), $0-24 MOVD a(FP), R2 MOVD b+8(FP), R3 ADD R3, R2 MOVD R2, 16(FP) RET
They all look the same. (Partly by design, partly because they are the same.)
The only significant variation is the names of instructions and registers.
Many details hidden, such as what RET
is. (It's a pseudo-instruction.)
(Offsets are determined by size of int
, among other things.)
The fortuitous syntax originated in Ken's National 32000 assembler.
With common syntax and the obj
library, can build a single assembler for all CPUs.
Not the same assembly notation as the manufacturers'.
Can be offputting to outsiders.
On the other hand, this approach uses the same notation on all machines.
New architectures can arrive without creating or learning new notation.
A tradeoff worth making.
32The apotheosis of assemblers.
New program, entirely in Go.
Common lexer and parser across all architectures.
Each instruction parsed into an instruction description.
That becomes a data structure passed to the new obj
library.
The core of the assembler has very little per-machine information.
Instead, tables are constructed at run time, flavored by $GOARCH
.
An internal package, cmd/asm/internal/arch
, creates these tables on the fly.
Machine details are loaded from obj
.
import ( "cmd/internal/obj" "cmd/internal/obj/x86" ) func archX86(linkArch *obj.LinkArch) *Arch { register := make(map[string]int16) // Create maps for easy lookup of instruction names etc. for i, s := range x86.Register { register[s] = int16(i + x86.REG_AL) } instructions := make(map[string]obj.As) for i, s := range obj.Anames { instructions[s] = x86.As(i) } return &Arch{ Instructions: instructions, Register: register, ... } }
Parser just does string matching to find the instruction.
34
Given an assembly run with GOOS=386
, the instruction
ADDW AX, BX
is parsed into in a data structure schematically like:
&obj.Prog{ As: arch.Instructions["ADDW"], From: obj.Addr{Reg: arch.Register["AX"]}, To: obj.Addr{Reg: arch.Register["BX"]}, ... }
That gets passed to the obj
library for encoding as a 386 instruction.
This is a purely mechanical process devoid of semantics.
35Assembler does some validation:
[R2,R5,R8,g]
only legal on ARM)
But all semantic checking is done by the obj
library.
If it can be turned into real instructions, it's legal!
36New assembler was tested against the old (C-written) ones.
A/B testing at the bit level: Same input must give same output.
Also reworked some parts of obj
packages for better diagnostics and debugging.
Did 386
first, then amd64
, arm
, and ppc
. Each was easier than the last.
No hardware manuals were opened during this process.
37
One Go program replaces many C/Yacc programs, so it's easier to maintain.
As a Go program it can have proper tests.
Dependent on obj
, so correctness and completeness are relatively simple to guarantee.
New assembler almost 100% compatible with previous ones.
Incompatibilities were mostly inconsistencies.
Portability is easy now.
A new instruction set just needs connecting it up with the obj
library,
plus a minor amount of architecture-specific tuning and validation.
Several architectures have been added since the assembler was created,
most by the open source community.
To a large extent, the assembler is now table-driven.
Can we generate those tables?
The disassemblers (used by go
tool
pprof
) are created by machine processing of PDFs.
The architecture definition is machine-readable, so use it!
Plan to go the other way:
Read in a PDF, write out obj
library definitions and bind to assembler.
Why write by hand when you can automate?
Hope to have this working soon; basics are already in place.
Result: a largely machine-generated assembler.
39Assembly language is essentially the same everywhere.
Use that to build a true common assembly language.
Customize it on the fly using dynamically loaded tables.
And one day: create those tables automatically.
A portable solution to a especially non-portable problem.
40Gophercon
12 July 2016