写点什么

Go Data Structures: Interfaces [中译]

作者:hyx
  • 2022 年 3 月 23 日
  • 本文字数:11061 字

    阅读完需:约 36 分钟

Go Data Structures: Interfaces [中译]

原文:Go Data Structures: Interfaces

作者: Russ Cox

译:willerhehehe@gmail.com


以下为正文部分:

Go's interfaces—static, checked at compile time, dynamic when asked for—are, for me, the most exciting part of Go from a language design point of view. If I could export one feature of Go into other languages, it would be interfaces.


Go 的interface(静态,编译时检查,使用时动态)是 Go 从语言设计上最令我激动的部分。如果我能选一个 Go 语言的特性导出到其他语言,那我一定会选择interface


This post is my take on the implementation of interface values in the “gc” compilers: 6g, 8g, and 5g. Over at Airs, Ian Lance Taylor has written two posts about the implementation of interface values in gccgo. The implementations are more alike than different: the biggest difference is that this post has pictures.


本文是我对 gc 编译器 6g、8g 和 5g[^1]中接口值的实现的看法: 。在 Airs 上 Ian Lance Taylor 已经写了两篇关于 gccgo 中如何实现 interface values 的文章。本文和这两篇文章没什么太大的不同,最大的区别是本文有图片。


Before looking at the implementation, let's get a sense of what it must support.


在看具体实现之前,让我们先一起看看它支持的特性。

Usage (使用方法)

Go's interfaces let you use duck typing like you would in a purely dynamic language like Python but still have the compiler catch obvious mistakes like passing an int where an object with a Read method was expected, or like calling the Read method with the wrong number of arguments. To use interfaces, first define the interface type (say, ReadCloser):

type ReadCloser interface {
 Read(b []byte) (n int, err os.Error)
 Close()
}


Go 的interface能让你像在 Python 这样的纯动态语言中一样使用鸭子类型,但同时又可以通过编译器捕获类似于给一个对象的Read方法传了一个int类型,或者在对用Read方法时传入了错误的个数的参数这样明显的错误。为了使用interface,首先需要定义一个interface类型(命名为 ReadCloser):


type ReadCloser interface {    Read(b []byte) (n int, err os.Error)    Close()}
复制代码


and then define your new function as taking a ReadCloser. For example, this function calls Read repeatedly to get all the data that was requested and then calls Close:

func ReadAndClose(r ReadCloser, buf []byte) (n int, err os.Error) {
 for len(buf) > 0 && err == nil {
     var nr int
     nr, err = r.Read(buf)
     n += nr
     buf = buf[nr:]
 }
 r.Close()
 return
}


然后定义一个新的函数,接收ReadCloser参数。例如,这个函数重复调用Read来获取所有请求的数据,最后调用Close:


func ReadAndClose(r ReadCloser, buf []byte) (n int, err os.Error) {    for len(buf) > 0 && err == nil {        var nr int        nr, err = r.Read(buf)        n += nr        buf = buf[nr:]    }    r.Close()    return}
复制代码


The code that calls ReadAndClose can pass a value of any type as long as it has Read and Close methods with the right signatures. And, unlike in languages like Python, if you pass a value with the wrong type, you get an error at compile time, not run time.


使用代码调用ReadAndClose方法时可以传递任何类型的参数,只要该参数类型有ReadClose方法且有正确的函数签名。同时,不像 Python 之类的语言,如果你传的值类型是错误的,你会在编译时就得到报错,而不是运行时才报错。


Interfaces aren't restricted to static checking, though. You can check dynamically whether a particular interface value has an additional method. For example:

type Stringer interface {
 String() string
}

func ToString(any interface{}) string {
 if v, ok := any.(Stringer); ok {
     return v.String()
 }
 switch v := any.(type) {
 case int:
     return strconv.Itoa(v)
 case float:
     return strconv.Ftoa(v, 'g', -1)
 }
 return "???"
}

The value any has static type interface{}, meaning no guarantee of any methods at all: it could contain any type. The “comma ok” assignment inside the if statement asks whether it is possible to convert any to an interface value of type Stringer, which has the method String. If so, the body of that statement calls the method to obtain a string to return. Otherwise, the switch picks off a few basic types before giving up. This is basically a stripped down version of what the fmt package does. (The if could be replaced by adding case Stringer: at the top of the switch, but I used a separate statement to draw attention to the check.)


但是,interfaces并不局限于静态检查。你可以动态检查一个特定的interface值是否有额外的方法。例如:


type Stringer interface {    String() string}
func ToString(any interface{}) string { if v, ok := any.(Stringer); ok { return v.String() } switch v := any.(type) { case int: return strconv.Itoa(v) case float: return strconv.Ftoa(v, 'g', -1) } return "???"}
复制代码


参数any的静态类型是interface{},这表示该类型根本没有任何必须具备的方法: 即它可以是任何类型。在上述代码片段中: if表达式内逗号右侧的ok表示是否能将any转换具有String方法类型为Stringerinterface值。如果可以,该表达式的 body 部分执行String方法得到并return一个 string类型的值;如果不行,在放弃判断前,switch表达式会选择几个基本类型做判断。这相当于一个fmt package的精简版。(上述代码里的if可以改成在switch顶部增加一个case Stringer,这里分开写的目的是为了引起注意。)


As a simple example, let's consider a 64-bit integer type with a String method that prints the value in binary and a trivial Get method:

type Binary uint64

func (i Binary) String() string {
 return strconv.Uitob64(i.Get(), 2)
}

func (i Binary) Get() uint64 {
 return uint64(i)
}

A value of type Binary can be passed to ToString, which will format it using the String method, even though the program never says that Binary intends to implement Stringer. There's no need: the runtime can see that Binary has a String method, so it implements Stringer, even if the author of Binary has never heard of Stringer.

These examples show that even though all the implicit conversions are checked at compile time, explicit interface-to-interface conversions can inquire about method sets at run time. “Effective Go” has more details about and examples of how interface values can be used.


作为一个例子,让我们考虑一个具有String方法来二进制打印其值,及一个Get方法的 64 位整型:


type Binary uint64
func (i Binary) String() string { return strconv.Uitob64(i.Get(), 2)}
func (i Binary) Get() uint64 { return uint64(i)}
复制代码


一个Binary类型的值可以被传入ToString方法(这个方法将会调用入参的String方法),即使程序从来没有显示表达过Binary实现了Stringer。这种显示声明没有必要:因为 runtime 可以看到Binary有一个String方法,所以Binary实现了Stringer接口,哪怕Binary类的作者从来没听过Stringer这个接口。

Interface Values (接口值)

Languages with methods typically fall into one of two camps: prepare tables for all the method calls statically (as in C++ and Java), or do a method lookup at each call (as in Smalltalk and its many imitators, JavaScript and Python included) and add fancy caching to make that call efficient. Go sits halfway between the two: it has method tables but computes them at run time. I don't know whether Go is the first language to use this technique, but it's certainly not a common one. (I'd be interested to hear about earlier examples; leave a comment below.)


具有方法的语言通常分为两个阵营:为所有方法调用准备静态表(如:C++和 Java)、或者在每个调用中执行方法查找(比如 Smalltalk 和它的许多模仿者,包括 JavaScript 和 Python) ,并添加花哨的缓存来提高调用的效率。


Go 处于两者之间:它有方法表(method tables),但在运行时计算它们。我不知道 Go 是不是第一个使用这种技术的语言,但 Go 显然不是一种常见的语言。(我很有兴趣听到更早的例子;请在下面留言。)


As a warmup, a value of type Binary is just a 64-bit integer made up of two 32-bit words (like in the last post, we'll assume a 32-bit machine; this time memory grows down instead of to the right):

Interface values are represented as a two-word pair giving a pointer to information about the type stored in the interface and a pointer to the associated data. Assigning b to an interface value of type Stringer sets both words of the interface value.

(The pointers contained in the interface value are gray to emphasize that they are implicit, not directly exposed to Go programs.)


作为热身,一个Binary类型是由两个 32 位字[^2]组成的 64bit 整型(正如上一篇博客,,我们假设使用 32 位机器;这次内存不是向右增长,而是向下):



接口值被表示为一个双字,其中一个指针指向接口中存的类型,另一个指针指向关联的数据。将b分配给类型为Stringer的接口值会设置接口值的双字值。



(接口值内灰色的指针主要是为了强调这里的隐性关系,这些指针并没有被直接暴露在 Go 程序中。)


The first word in the interface value points at what I call an interface table or itable (pronounced i-table; in the runtime sources, the C implementation name is Itab). The itable begins with some metadata about the types involved and then becomes a list of function pointers. Note that the itable corresponds to the interface type, not the dynamic type. In terms of our example, the itable for Stringer holding type Binary lists the methods used to satisfy Stringer, which is just String: Binary's other methods (Get) make no appearance in the itable.


接口值内的第一个字,指向一个interface table或称为itable(发音为 i-table; 在运行时源码内,C 的实现叫做 Itab)。itable内的开始部分存储这一些类型相关的元数据(metadata),剩下的部分存储着一组函数指针。注意 itable 相当于接口类型,而不是动态类型。根据我们的示例,Stringer保存Binary类型的 itable 列出了用于满足Stringer接口的方法,这里的方法只有String: Binary的其他方法(如Get)并没有出现在 itable 中。


The second word in the interface value points at the actual data, in this case a copy of b. The assignment var s Stringer = b makes a copy of b rather than point at b for the same reason that var c uint64 = b makes a copy: if b later changes, s and c are supposed to have the original value, not the new one. Values stored in interfaces might be arbitrarily large, but only one word is dedicated to holding the value in the interface structure, so the assignment allocates a chunk of memory on the heap and records the pointer in the one-word slot. (There's an obvious optimization when the value does fit in the slot; we'll get to that later.)


接口值中的第二个字指向实际数据,在本例中是 b 的副本。赋值语句var s Stringer = b 会复制 b 而不是指向 b,就像 var c uint64 = b 复制 b 一样: 如果 b 后来改变了,sc 应该有原来的值,而不是新的值。存储在接口中的值可能是任意大的,但是只有一个字长专门用于保存接口结构中的值,因此赋值分配了堆上的一块内存,并将指针记录在一个字槽中。(当值大小正好适合字槽大小时,会有一个明显的优化; 我们稍后将讨论这个问题。)


To check whether an interface value holds a particular type, as in the type switch above, the Go compiler generates code equivalent to the C expression s.tab->type to obtain the type pointer and check it against the desired type. If the types match, the value can be copied by by dereferencing s.data.


为了检查接口值是否包含特定类型,如上面的 type switch,Go 编译器生成与 c 表达式 s.tab->type等效的代码,以获取类型指针并检查它与所需的类型。如果类型匹配,可以通过解引用 s.data 来复制该值。


To call s.String(), the Go compiler generates code that does the equivalent of the C expression s.tab->fun[0](s.data): it calls the appropriate function pointer from the itable, passing the interface value's data word as the function's first (in this example, only) argument. You can see this code if you run 8g -S x.go (details at the bottom of this post). Note that the function in the itable is being passed the 32-bit pointer from the second word of the interface value, not the 64-bit value it points at. In general, the interface call site doesn't know the meaning of this word nor how much data it points at. Instead, the interface code arranges that the function pointers in the itable expect the 32-bit representation stored in the interface values. Thus the function pointer in this example is (*Binary).String not Binary.String.

The example we're considering is an interface with just one method. An interface with more methods would have more entries in the fun list at the bottom of the itable.


为了调用 s.String() ,Go 编译器生成与 c 表达式 s.tab->fun[0](s.data)等效的代码: 它从 itable 中调用适当的函数指针,将接口值的数据字作为函数的第一个参数传递(仅在这个例子中)。如果您运行8g -S x.go,就可以看到这段代码(详见本文底部)。注意,该函数是从接口值的第二个字传递 32 位指针,而不是它指向的 64 位值。一般来说,接口调用点不知道这个字的意思,也不知道它指向多少数据。相反,接口代码排列了 itable 中的函数指针,除了存储在接口值中的 32 位表示。因此,这个例子中的函数指针是(*Binary).String而不是Binary.String


我们正在考虑的例子是一个只有一个方法的接口。一个包含更多方法的接口会在 fun 列表的底部有更多的条目。

Computing the Itable (动态生成 Itable)

Now we know what the itables look like, but where do they come from? Go's dynamic type conversions mean that it isn't reasonable for the compiler or linker to precompute all possible itables: there are too many (interface type, concrete type) pairs, and most won't be needed. Instead, the compiler generates a type description structure for each concrete type like Binary or int or func(map[int]string). Among other metadata, the type description structure contains a list of the methods implemented by that type. Similarly, the compiler generates a (different) type description structure for each interface type like Stringer; it too contains a method list. The interface runtime computes the itable by looking for each method listed in the interface type's method table in the concrete type's method table. The runtime caches the itable after generating it, so that this correspondence need only be computed once.


现在我们知道了 itables 的样子,但是它们是从哪里来的呢?Go 的动态类型转换意味着编译器或链接器预先计算所有可能的可能对象是不合理的: 有太多组(接口类型、具体类型),而且大多数不需要。相反,编译器为每个具体类型生成类型描述结构,如Binaryintfunc (map[int]string)。在其他元数据中,类型描述结构包含由该类型实现的方法的列表。类似地,编译器为每个接口类型(如 Stringer)生成(不同的)类型描述结构; 它也包含一个方法列表。接口通过运行时查找具体类型方法表中,与接口类型方法表中对应的方法来动态生成 itable。运行时在生成 itable 后缓存它,因此只需计算一次这种对应关系。


In our simple example, the method table for Stringer has one method, while the table for Binary has two methods. In general there might be ni methods for the interface type and nt methods for the concrete type. The obvious search to find the mapping from interface methods to concrete methods would take O(ni × nt) time, but we can do better. By sorting the two method tables and walking them simultaneously, we can build the mapping in O(ni + nt) time instead.


在我们的简单示例中,Stringer的方法表有一个方法,而 Binary 的表有两个方法。一般来说,接口类型可能有 ni 个方法,具体类型可能有 nt 个方法。从接口方法到具体方法的映射显然需要 O(ni × nt)时间复杂度,但我们可以做得更好。通过对这两个方法表进行排序并同时遍历它们,我们可以在 O(ni + nt)时间复杂度内建立映射。

Memory Optimizations (内存优化)

The space used by the implementation described above can be optimized in two complementary ways.

First, if the interface type involved is empty—it has no methods—then the itable serves no purpose except to hold the pointer to the original type. In this case, the itable can be dropped and the value can point at the type directly:

Whether an interface type has methods is a static property—either the type in the source code says interface{} or it says interace{ methods... }—so the compiler knows which representation is in use at each point in the program.


上述实现所使用的空间可以通过两种互补的方式进行优化。


首先,如果所涉及的接口类型是空的(interface{})ーー没有任何方法ーー那么,除了保存指向原始类型的指针之外,itable 没有任何用途。在这种情况下,可以删除 itable,将第一个字可以直接指向类型:



一个接口类型是否具有方法是一个静态属性——要么源代码中的类型表示为interface{},要么它表示为 interface{ methods...}——所以编译器知道程序中每个点使用的是哪种表示。


Second, if the value associated with the interface value can fit in a single machine word, there's no need to introduce the indirection or the heap allocation. If we define Binary32 to be like Binary but implemented as a uint32, it could be stored in an interface value by keeping the actual value in the second word:


其次,如果与接口值关联的值可以放在单个字中,则不需要引入间接或堆分配。如果我们将 Binary32定义为类似 Binary 但实现为 uint32,那么它可以通过将第二个字中的实际值直接存储在接口值中:



Whether the actual value is being pointed at or inlined depends on the size of the type. The compiler arranges for the functions listed in the type's method table (which get copied into the itables) to do the right thing with the word that gets passed in. If the receiver type fits in a word, it is used directly; if not, it is dereferenced. The diagrams show this: in the Binary version far above, the method in the itable is (*Binary).String, while in the Binary32 example, the method in the itable is Binary32.String not (*Binary32).String.


实际值是存指针还是直接存值取决于要存的类型的大小。编译器安排类型的方法表中列出的函数(这些函数被复制到 itables 中) ,以便对传入的字执行正确的操作。如果待接收的类型正好是一个字长,则直接使用存储; 如果不适合,则存指针。这些图表显示了这一点: 在上面Binary的版本中,itable 中的方法是(*Binary).String,而在 Binary32示例中,itable 中的方法是Binary32.String 而不是(*Binary32).String


Of course, empty interfaces holding word-sized (or smaller) values can take advantage of both optimizations:


当然,包含单字大小(或更小)值的空接口可以同时利用这两种优化:


Method Lookup Performance (方法查找性能)

Smalltalk and the many dynamic systems that have followed it perform a method lookup every time a method gets called. For speed, many implementations use a simple one-entry cache at each call site, often in the instruction stream itself. In a multithreaded program, these caches must be managed carefully, since multiple threads could be at the same call site simultaneously. Even once the races have been avoided, the caches would end up being a source of memory contention.

Because Go has the hint of static typing to go along with the dynamic method lookups, it can move the lookups back from the call sites to the point when the value is stored in the interface. For example, consider this code snippet:

1   var any interface{}  // initialized elsewhere
2   s := any.(Stringer)  // dynamic conversion
3   for i := 0; i < 100; i++ {
4       fmt.Println(s.String())
5   }


Smalltalk 及其后的许多动态语言在每次调用方法时都执行方法查找。为了提高速度,许多实现在每个调用点使用一个简单的缓存(one-entry caches?),通常在指令流本身中。在多线程程序中,必须小心地管理这些缓存,因为多个线程可以同时在同一个调用点上。即使避免了竞态条件,缓存也将最终成为内存占用的源头。


因为 Go 具有动态方法查找的同时,还具有静态类型提示,所以它可以将查找从调用点移回到值存储在接口中的位置。例如,考虑下面的代码片段:


1   var any interface{}  // initialized elsewhere2   s := any.(Stringer)  // dynamic conversion3   for i := 0; i < 100; i++ {4       fmt.Println(s.String())5   }
复制代码


在 Go 中,上述代码 itable 在第二行动态生成(或者从缓存中查找);第 4 行上执行的s.String()调用则是几个内存访问和一个间接调用指令。


In contrast, the implementation of this program in a dynamic language like Smalltalk (or JavaScript, or Python, or ...) would do the method lookup at line 4, which in a loop repeats needless work. The cache mentioned earlier makes this less expensive than it might be, but it's still more expensive than a single indirect call instruction.


相比之下,这个程序在类似 Smalltalk (或 JavaScript,或 Python,或...)这样的动态语言中的实现将在第 4 行执行方法查找,这在循环中重复了不必要的工作。前面提到的缓存使得这个代价降低,但是它的开销仍然比单个间接调用指令更加高。


Of course, this being a blog post, I don't have any numbers to back up this discussion, but it certainly seems like the lack of memory contention would be a big win in a heavily parallel program, as is being able to move the method lookup out of tight loops. Also, I'm talking about the general architecture, not the specifics o the implementation: the latter probably has a few constant factor optimizations still available.


当然,这是只一篇博客文章,我没有任何数据来支持这个讨论,但是看起来对于一个高度并行的程序来说,较少的内存占用是一个巨大的胜利,因为它能够将方法查找从紧密的循环中移除。此外,我说的是总体架构,而不是具体的实现: 后者可能还有一些常量因子优化可用。

More Information (补充信息)

The interface runtime support is in $GOROOT/src/pkg/runtime/iface.c. There's much more to say about interfaces (we haven't even seen an example of a pointer receiver yet) and the type descriptors (they power reflection in addition to the interface runtime) but those will have to wait for future posts.


接口运行时支持在 $GOROOT/src/pkg/runtime/iface.c[^3]中。关于接口还有很多要说的(我们甚至还没有看到一个指针接收器的例子)和类型描述符(除了接口运行时之外,它们还支持反射) ,但是这些需要等待以后的文章。

Code (代码)

Supporting code (x.go):

package main

import (
"fmt"
"strconv"
)

type Stringer interface {
String() string
}

type Binary uint64

func (i Binary) String() string {
return strconv.Uitob64(i.Get(), 2)
}

func (i Binary) Get() uint64 {
return uint64(i)
}

func main() {
b := Binary(200)
s := Stringer(b)
fmt.Println(s.String())
}

Selected output of 8g -S x.go:

0045 (x.go:25) LEAL    s+-24(SP),BX
0046 (x.go:25) MOVL    4(BX),BP
0047 (x.go:25) MOVL    BP,(SP)
0048 (x.go:25) MOVL    (BX),BX
0049 (x.go:25) MOVL    20(BX),BX
0050 (x.go:25) CALL    ,BX

The LEAL loads the address of s into the register BX. (The notation n(SP) describes the word in memory at SP+n. 0(SP) can be shortened to (SP).) The next two MOVL instructions fetch the value from the second word in the interface and store it as the first function call argument, 0(SP). The final two MOVL instructions fetch the itable and then the function pointer from the itable, in preparation for calling that function.


LEAL指令将s的地址加载到BX寄存器中。(操作数 n(SP)描述了内存中的字在SP+n地址上。0(SP)可以缩写为(SP)。)接下来的两个MOVL 指令从接口中的第二个字获取值,并将其存储为第一个函数调用参数,0(SP)。最后两条 MOVL 指令获取 itable 及 itable 上的函数指针,为调用这个函数做准备。




[^1]: 译注:这里的 gc 是 Go Compiler 的缩写,其中 6g、8g 和 5g 是 gc 工具链的一部分,参考(https://github.com/golang/go/tree/go1.4.3/src/cmd/6g),注意 go1.5 以后 gc 工具链改变,已经从由 C 的实现变成 Go 的实现,因此 go1.5 以后并不能找到这些工具。


[^2]: 译注: 字指字长,这里假设为 32 位计算机,所以字长为 32 位。[^3]: 译注:当前版本的 go1.18 在 $GOROOT/src/pkg/runtime/iface.go 中。

用户头像

hyx

关注

还未添加个人签名 2020.09.30 加入

还未添加个人简介

评论

发布
暂无评论
Go Data Structures: Interfaces [中译]_源码_hyx_InfoQ写作平台