Go Data Structures: Interfaces [中译]
原文:Go Data Structures: Interfaces
作者: Russ Cox
以下为正文部分:
Go's interfaces—static, checked at compile time, dynamic when asked for—are, for me, the most exciting part of Go from a language design point of view. If I could export one feature of Go into other languages, it would be interfaces.
Go 的interface
(静态,编译时检查,使用时动态)是 Go 从语言设计上最令我激动的部分。如果我能选一个 Go 语言的特性导出到其他语言,那我一定会选择interface
。
This post is my take on the implementation of interface values in the “gc” compilers: 6g, 8g, and 5g. Over at Airs, Ian Lance Taylor has written two posts about the implementation of interface values in
gccgo
. The implementations are more alike than different: the biggest difference is that this post has pictures.
本文是我对 gc 编译器 6g、8g 和 5g[^1]中接口值的实现的看法: 。在 Airs 上 Ian Lance Taylor 已经写了两篇关于 gccgo 中如何实现 interface values 的文章。本文和这两篇文章没什么太大的不同,最大的区别是本文有图片。
Before looking at the implementation, let's get a sense of what it must support.
在看具体实现之前,让我们先一起看看它支持的特性。
Usage (使用方法)
Go's interfaces let you use duck typing like you would in a purely dynamic language like Python but still have the compiler catch obvious mistakes like passing an
int
where an object with aRead
method was expected, or like calling theRead
method with the wrong number of arguments. To use interfaces, first define the interface type (say,ReadCloser
):type ReadCloser interface { Read(b []byte) (n int, err os.Error) Close() }
Go 的interface
能让你像在 Python 这样的纯动态语言中一样使用鸭子类型,但同时又可以通过编译器捕获类似于给一个对象的Read
方法传了一个int
类型,或者在对用Read
方法时传入了错误的个数的参数这样明显的错误。为了使用interface
,首先需要定义一个interface
类型(命名为 ReadCloser):
and then define your new function as taking a
ReadCloser
. For example, this function callsRead
repeatedly to get all the data that was requested and then callsClose
:func ReadAndClose(r ReadCloser, buf []byte) (n int, err os.Error) { for len(buf) > 0 && err == nil { var nr int nr, err = r.Read(buf) n += nr buf = buf[nr:] } r.Close() return }
然后定义一个新的函数,接收ReadCloser
参数。例如,这个函数重复调用Read
来获取所有请求的数据,最后调用Close
:
The code that calls
ReadAndClose
can pass a value of any type as long as it hasRead
andClose
methods with the right signatures. And, unlike in languages like Python, if you pass a value with the wrong type, you get an error at compile time, not run time.
使用代码调用ReadAndClose
方法时可以传递任何类型的参数,只要该参数类型有Read
和Close
方法且有正确的函数签名。同时,不像 Python 之类的语言,如果你传的值类型是错误的,你会在编译时就得到报错,而不是运行时才报错。
Interfaces aren't restricted to static checking, though. You can check dynamically whether a particular interface value has an additional method. For example:
type Stringer interface { String() string } func ToString(any interface{}) string { if v, ok := any.(Stringer); ok { return v.String() } switch v := any.(type) { case int: return strconv.Itoa(v) case float: return strconv.Ftoa(v, 'g', -1) } return "???" }
The value
any
has static typeinterface{}
, meaning no guarantee of any methods at all: it could contain any type. The “comma ok” assignment inside theif
statement asks whether it is possible to convertany
to an interface value of typeStringer
, which has the methodString
. If so, the body of that statement calls the method to obtain a string to return. Otherwise, theswitch
picks off a few basic types before giving up. This is basically a stripped down version of what the fmt package does. (Theif
could be replaced by addingcase Stringer:
at the top of theswitch
, but I used a separate statement to draw attention to the check.)
但是,interfaces
并不局限于静态检查。你可以动态检查一个特定的interface
值是否有额外的方法。例如:
参数any
的静态类型是interface{}
,这表示该类型根本没有任何必须具备的方法: 即它可以是任何类型。在上述代码片段中: if
表达式内逗号右侧的ok
表示是否能将any
转换具有String
方法类型为Stringer
的interface
值。如果可以,该表达式的 body 部分执行String
方法得到并return
一个 string
类型的值;如果不行,在放弃判断前,switch
表达式会选择几个基本类型做判断。这相当于一个fmt package
的精简版。(上述代码里的if
可以改成在switch
顶部增加一个case Stringer
,这里分开写的目的是为了引起注意。)
As a simple example, let's consider a 64-bit integer type with a
String
method that prints the value in binary and a trivialGet
method:type Binary uint64 func (i Binary) String() string { return strconv.Uitob64(i.Get(), 2) } func (i Binary) Get() uint64 { return uint64(i) }
A value of type
Binary
can be passed toToString
, which will format it using theString
method, even though the program never says thatBinary
intends to implementStringer
. There's no need: the runtime can see thatBinary
has aString
method, so it implementsStringer
, even if the author ofBinary
has never heard ofStringer
.These examples show that even though all the implicit conversions are checked at compile time, explicit interface-to-interface conversions can inquire about method sets at run time. “Effective Go” has more details about and examples of how interface values can be used.
作为一个例子,让我们考虑一个具有String
方法来二进制打印其值,及一个Get
方法的 64 位整型:
一个Binary
类型的值可以被传入ToString
方法(这个方法将会调用入参的String
方法),即使程序从来没有显示表达过Binary
实现了Stringer
。这种显示声明没有必要:因为 runtime 可以看到Binary
有一个String
方法,所以Binary
实现了Stringer
接口,哪怕Binary
类的作者从来没听过Stringer
这个接口。
Interface Values (接口值)
Languages with methods typically fall into one of two camps: prepare tables for all the method calls statically (as in C++ and Java), or do a method lookup at each call (as in Smalltalk and its many imitators, JavaScript and Python included) and add fancy caching to make that call efficient. Go sits halfway between the two: it has method tables but computes them at run time. I don't know whether Go is the first language to use this technique, but it's certainly not a common one. (I'd be interested to hear about earlier examples; leave a comment below.)
具有方法的语言通常分为两个阵营:为所有方法调用准备静态表(如:C++和 Java)、或者在每个调用中执行方法查找(比如 Smalltalk 和它的许多模仿者,包括 JavaScript 和 Python) ,并添加花哨的缓存来提高调用的效率。
Go 处于两者之间:它有方法表(method tables),但在运行时计算它们。我不知道 Go 是不是第一个使用这种技术的语言,但 Go 显然不是一种常见的语言。(我很有兴趣听到更早的例子;请在下面留言。)
As a warmup, a value of type
Binary
is just a 64-bit integer made up of two 32-bit words (like in the last post, we'll assume a 32-bit machine; this time memory grows down instead of to the right):Interface values are represented as a two-word pair giving a pointer to information about the type stored in the interface and a pointer to the associated data. Assigning
b
to an interface value of typeStringer
sets both words of the interface value.(The pointers contained in the interface value are gray to emphasize that they are implicit, not directly exposed to Go programs.)
作为热身,一个Binary
类型是由两个 32 位字[^2]组成的 64bit 整型(正如上一篇博客,,我们假设使用 32 位机器;这次内存不是向右增长,而是向下):
接口值被表示为一个双字,其中一个指针指向接口中存的类型,另一个指针指向关联的数据。将b
分配给类型为Stringer
的接口值会设置接口值的双字值。
(接口值内灰色的指针主要是为了强调这里的隐性关系,这些指针并没有被直接暴露在 Go 程序中。)
The first word in the interface value points at what I call an interface table or itable (pronounced i-table; in the runtime sources, the C implementation name is
Itab
). The itable begins with some metadata about the types involved and then becomes a list of function pointers. Note that the itable corresponds to the interface type, not the dynamic type. In terms of our example, the itable forStringer
holding typeBinary
lists the methods used to satisfyStringer
, which is justString
:Binary
's other methods (Get
) make no appearance in the itable.
接口值内的第一个字,指向一个interface table
或称为itable
(发音为 i-table; 在运行时源码内,C 的实现叫做 Itab)。itable
内的开始部分存储这一些类型相关的元数据(metadata),剩下的部分存储着一组函数指针。注意 itable 相当于接口类型,而不是动态类型。根据我们的示例,Stringer
保存Binary
类型的 itable 列出了用于满足Stringer
接口的方法,这里的方法只有String
: Binary
的其他方法(如Get
)并没有出现在 itable 中。
The second word in the interface value points at the actual data, in this case a copy of
b
. The assignmentvar s Stringer = b
makes a copy ofb
rather than point atb
for the same reason thatvar c uint64 = b
makes a copy: ifb
later changes,s
andc
are supposed to have the original value, not the new one. Values stored in interfaces might be arbitrarily large, but only one word is dedicated to holding the value in the interface structure, so the assignment allocates a chunk of memory on the heap and records the pointer in the one-word slot. (There's an obvious optimization when the value does fit in the slot; we'll get to that later.)
接口值中的第二个字指向实际数据,在本例中是 b
的副本。赋值语句var s Stringer = b
会复制 b
而不是指向 b
,就像 var c uint64 = b
复制 b
一样: 如果 b
后来改变了,s
和c
应该有原来的值,而不是新的值。存储在接口中的值可能是任意大的,但是只有一个字长专门用于保存接口结构中的值,因此赋值分配了堆上的一块内存,并将指针记录在一个字槽中。(当值大小正好适合字槽大小时,会有一个明显的优化; 我们稍后将讨论这个问题。)
To check whether an interface value holds a particular type, as in the type switch above, the Go compiler generates code equivalent to the C expression
s.tab->type
to obtain the type pointer and check it against the desired type. If the types match, the value can be copied by by dereferencings.data
.
为了检查接口值是否包含特定类型,如上面的 type switch,Go 编译器生成与 c 表达式 s.tab->type
等效的代码,以获取类型指针并检查它与所需的类型。如果类型匹配,可以通过解引用 s.data
来复制该值。
To call
s.String()
, the Go compiler generates code that does the equivalent of the C expressions.tab->fun[0](s.data)
: it calls the appropriate function pointer from the itable, passing the interface value's data word as the function's first (in this example, only) argument. You can see this code if you run8g -S x.go
(details at the bottom of this post). Note that the function in the itable is being passed the 32-bit pointer from the second word of the interface value, not the 64-bit value it points at. In general, the interface call site doesn't know the meaning of this word nor how much data it points at. Instead, the interface code arranges that the function pointers in the itable expect the 32-bit representation stored in the interface values. Thus the function pointer in this example is(*Binary).String
notBinary.String
.The example we're considering is an interface with just one method. An interface with more methods would have more entries in the fun list at the bottom of the itable.
为了调用 s.String()
,Go 编译器生成与 c 表达式 s.tab->fun[0](s.data)
等效的代码: 它从 itable 中调用适当的函数指针,将接口值的数据字作为函数的第一个参数传递(仅在这个例子中)。如果您运行8g -S x.go
,就可以看到这段代码(详见本文底部)。注意,该函数是从接口值的第二个字传递 32 位指针,而不是它指向的 64 位值。一般来说,接口调用点不知道这个字的意思,也不知道它指向多少数据。相反,接口代码排列了 itable 中的函数指针,除了存储在接口值中的 32 位表示。因此,这个例子中的函数指针是(*Binary).String
而不是Binary.String
。
我们正在考虑的例子是一个只有一个方法的接口。一个包含更多方法的接口会在 fun
列表的底部有更多的条目。
Computing the Itable (动态生成 Itable)
Now we know what the itables look like, but where do they come from? Go's dynamic type conversions mean that it isn't reasonable for the compiler or linker to precompute all possible itables: there are too many (interface type, concrete type) pairs, and most won't be needed. Instead, the compiler generates a type description structure for each concrete type like
Binary
orint
orfunc(map[int]string)
. Among other metadata, the type description structure contains a list of the methods implemented by that type. Similarly, the compiler generates a (different) type description structure for each interface type likeStringer
; it too contains a method list. The interface runtime computes the itable by looking for each method listed in the interface type's method table in the concrete type's method table. The runtime caches the itable after generating it, so that this correspondence need only be computed once.
现在我们知道了 itables 的样子,但是它们是从哪里来的呢?Go 的动态类型转换意味着编译器或链接器预先计算所有可能的可能对象是不合理的: 有太多组(接口类型、具体类型),而且大多数不需要。相反,编译器为每个具体类型生成类型描述结构,如Binary
、 int
或 func (map[int]string)
。在其他元数据中,类型描述结构包含由该类型实现的方法的列表。类似地,编译器为每个接口类型(如 Stringer
)生成(不同的)类型描述结构; 它也包含一个方法列表。接口通过运行时查找具体类型方法表中,与接口类型方法表中对应的方法来动态生成 itable。运行时在生成 itable 后缓存它,因此只需计算一次这种对应关系。
In our simple example, the method table for
Stringer
has one method, while the table forBinary
has two methods. In general there might be ni methods for the interface type and nt methods for the concrete type. The obvious search to find the mapping from interface methods to concrete methods would take O(ni × nt) time, but we can do better. By sorting the two method tables and walking them simultaneously, we can build the mapping in O(ni + nt) time instead.
在我们的简单示例中,Stringer
的方法表有一个方法,而 Binary
的表有两个方法。一般来说,接口类型可能有 ni 个方法,具体类型可能有 nt 个方法。从接口方法到具体方法的映射显然需要 O(ni × nt)时间复杂度,但我们可以做得更好。通过对这两个方法表进行排序并同时遍历它们,我们可以在 O(ni + nt)时间复杂度内建立映射。
Memory Optimizations (内存优化)
The space used by the implementation described above can be optimized in two complementary ways.
First, if the interface type involved is empty—it has no methods—then the itable serves no purpose except to hold the pointer to the original type. In this case, the itable can be dropped and the value can point at the type directly:
Whether an interface type has methods is a static property—either the type in the source code says
interface{}
or it saysinterace{ methods... }
—so the compiler knows which representation is in use at each point in the program.
上述实现所使用的空间可以通过两种互补的方式进行优化。
首先,如果所涉及的接口类型是空的(interface{})ーー没有任何方法ーー那么,除了保存指向原始类型的指针之外,itable 没有任何用途。在这种情况下,可以删除 itable,将第一个字可以直接指向类型:
一个接口类型是否具有方法是一个静态属性——要么源代码中的类型表示为interface{}
,要么它表示为 interface{ methods...}
——所以编译器知道程序中每个点使用的是哪种表示。
Second, if the value associated with the interface value can fit in a single machine word, there's no need to introduce the indirection or the heap allocation. If we define
Binary32
to be likeBinary
but implemented as auint32
, it could be stored in an interface value by keeping the actual value in the second word:
其次,如果与接口值关联的值可以放在单个字中,则不需要引入间接或堆分配。如果我们将 Binary32
定义为类似 Binary
但实现为 uint32
,那么它可以通过将第二个字中的实际值直接存储在接口值中:
Whether the actual value is being pointed at or inlined depends on the size of the type. The compiler arranges for the functions listed in the type's method table (which get copied into the itables) to do the right thing with the word that gets passed in. If the receiver type fits in a word, it is used directly; if not, it is dereferenced. The diagrams show this: in the
Binary
version far above, the method in the itable is(*Binary).String
, while in theBinary32
example, the method in the itable isBinary32.String
not(*Binary32).String
.
实际值是存指针还是直接存值取决于要存的类型的大小。编译器安排类型的方法表中列出的函数(这些函数被复制到 itables 中) ,以便对传入的字执行正确的操作。如果待接收的类型正好是一个字长,则直接使用存储; 如果不适合,则存指针。这些图表显示了这一点: 在上面Binary
的版本中,itable 中的方法是(*Binary).String
,而在 Binary32
示例中,itable 中的方法是Binary32.String
而不是(*Binary32).String
。
Of course, empty interfaces holding word-sized (or smaller) values can take advantage of both optimizations:
当然,包含单字大小(或更小)值的空接口可以同时利用这两种优化:
Method Lookup Performance (方法查找性能)
Smalltalk and the many dynamic systems that have followed it perform a method lookup every time a method gets called. For speed, many implementations use a simple one-entry cache at each call site, often in the instruction stream itself. In a multithreaded program, these caches must be managed carefully, since multiple threads could be at the same call site simultaneously. Even once the races have been avoided, the caches would end up being a source of memory contention.
Because Go has the hint of static typing to go along with the dynamic method lookups, it can move the lookups back from the call sites to the point when the value is stored in the interface. For example, consider this code snippet:
1 var any interface{} // initialized elsewhere 2 s := any.(Stringer) // dynamic conversion 3 for i := 0; i < 100; i++ { 4 fmt.Println(s.String()) 5 }
Smalltalk 及其后的许多动态语言在每次调用方法时都执行方法查找。为了提高速度,许多实现在每个调用点使用一个简单的缓存(one-entry caches?),通常在指令流本身中。在多线程程序中,必须小心地管理这些缓存,因为多个线程可以同时在同一个调用点上。即使避免了竞态条件,缓存也将最终成为内存占用的源头。
因为 Go 具有动态方法查找的同时,还具有静态类型提示,所以它可以将查找从调用点移回到值存储在接口中的位置。例如,考虑下面的代码片段:
在 Go 中,上述代码 itable 在第二行动态生成(或者从缓存中查找);第 4 行上执行的s.String()
调用则是几个内存访问和一个间接调用指令。
In contrast, the implementation of this program in a dynamic language like Smalltalk (or JavaScript, or Python, or ...) would do the method lookup at line 4, which in a loop repeats needless work. The cache mentioned earlier makes this less expensive than it might be, but it's still more expensive than a single indirect call instruction.
相比之下,这个程序在类似 Smalltalk (或 JavaScript,或 Python,或...)这样的动态语言中的实现将在第 4 行执行方法查找,这在循环中重复了不必要的工作。前面提到的缓存使得这个代价降低,但是它的开销仍然比单个间接调用指令更加高。
Of course, this being a blog post, I don't have any numbers to back up this discussion, but it certainly seems like the lack of memory contention would be a big win in a heavily parallel program, as is being able to move the method lookup out of tight loops. Also, I'm talking about the general architecture, not the specifics o the implementation: the latter probably has a few constant factor optimizations still available.
当然,这是只一篇博客文章,我没有任何数据来支持这个讨论,但是看起来对于一个高度并行的程序来说,较少的内存占用是一个巨大的胜利,因为它能够将方法查找从紧密的循环中移除。此外,我说的是总体架构,而不是具体的实现: 后者可能还有一些常量因子优化可用。
More Information (补充信息)
The interface runtime support is in
$GOROOT/src/pkg/runtime/iface.c
. There's much more to say about interfaces (we haven't even seen an example of a pointer receiver yet) and the type descriptors (they power reflection in addition to the interface runtime) but those will have to wait for future posts.
接口运行时支持在 $GOROOT/src/pkg/runtime/iface.c[^3]中。关于接口还有很多要说的(我们甚至还没有看到一个指针接收器的例子)和类型描述符(除了接口运行时之外,它们还支持反射) ,但是这些需要等待以后的文章。
Code (代码)
Supporting code (
x.go
):package main import ( "fmt" "strconv" ) type Stringer interface { String() string } type Binary uint64 func (i Binary) String() string { return strconv.Uitob64(i.Get(), 2) } func (i Binary) Get() uint64 { return uint64(i) } func main() { b := Binary(200) s := Stringer(b) fmt.Println(s.String()) }
Selected output of
8g -S x.go
:0045 (x.go:25) LEAL s+-24(SP),BX 0046 (x.go:25) MOVL 4(BX),BP 0047 (x.go:25) MOVL BP,(SP) 0048 (x.go:25) MOVL (BX),BX 0049 (x.go:25) MOVL 20(BX),BX 0050 (x.go:25) CALL ,BX
The
LEAL
loads the address ofs
into the registerBX
. (The notationn(SP)
describes the word in memory atSP+n
.0(SP)
can be shortened to(SP)
.) The next twoMOVL
instructions fetch the value from the second word in the interface and store it as the first function call argument,0(SP)
. The final twoMOVL
instructions fetch the itable and then the function pointer from the itable, in preparation for calling that function.
LEAL
指令将s
的地址加载到BX
寄存器中。(操作数 n(SP)
描述了内存中的字在SP+n
地址上。0(SP)
可以缩写为(SP)
。)接下来的两个MOVL
指令从接口中的第二个字获取值,并将其存储为第一个函数调用参数,0(SP)
。最后两条 MOVL
指令获取 itable 及 itable 上的函数指针,为调用这个函数做准备。
[^1]: 译注:这里的 gc 是 Go Compiler 的缩写,其中 6g、8g 和 5g 是 gc 工具链的一部分,参考(https://github.com/golang/go/tree/go1.4.3/src/cmd/6g),注意 go1.5 以后 gc 工具链改变,已经从由 C 的实现变成 Go 的实现,因此 go1.5 以后并不能找到这些工具。
[^2]: 译注: 字指字长,这里假设为 32 位计算机,所以字长为 32 位。[^3]: 译注:当前版本的 go1.18 在 $GOROOT/src/pkg/runtime/iface.go 中。
评论