深入理解 Go 中的字符串

2022 年 5 月 01 日
本文字数：1330 字
阅读完需：约 4 分钟

字符串的本质

在编程语言中，字符串发挥着重要的角色。字符串背后的数据结构一般有两种类型：

一种在编译时指定长度，不能修改
一种具有动态的长度，可以修改。

比如：与 Python 中的字符串一样，Go 语言中的字符串不能被修改，只能被访问。

在 Python 中，如果改变一个字符串的值会得到如下结果：

>>> hi = "Hello">>> hi'Hello'>>> hi[0] = 'h'Traceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: 'str' object does not support item assignment>>>

复制代码

同理，在 Go 中也一样：

package main
import "fmt"
func main() {
  var hello = "Hello"  hello[1] = 'h'
  fmt.Println(hello)}
// # command-line-arguments// string_in_go/main.go:8:11: cannot assign to hello[1] (strings are immutable)

复制代码

字符串的终止方式有两种：

一种是 C 语言的隐式声明，以字符 “\0” 作为终止符
一种是 Go 语言的显式声明

Go 语言的 string 的表示结构如下：

type StringHeader struct {  Data uintptr  // Data 指向底层的字符数组  Len int  // Len 用来表示字符串的长度}

复制代码

字符串的本质上是一串字符数组，每个字符都在存储时对应了一个或多个整数。用这些整数来表示字符，比如打印 hello 的字节数组如下：

package main
import "fmt"
func main() {
  var hello = "Hello"  for i := 0; i < len(hello); i++ {    fmt.Printf("%x ", hello[i])  }}// Output: 48 65 6c 6c 6f

复制代码

字符串的底层原理

字符串有特殊标识，有两种声明方式：

var s1 string = `hello world`var s2 string = "hello world"

复制代码

字符串常量在词法解析阶段最终会被标记为 StringLit 类型的 Token 并被传递到编译的下一个阶段。

在语法分析阶段，采取递归下降的方式读取 UTF-8 字符，单撇号或双引号是字符串的标识。分析的逻辑位于 syntax/scanner.go 文件中:

func (s *scanner) stdString() {  ok := true  s.nextch()
  for {    if s.ch == '"' {      s.nextch()      break    }    if s.ch == '\\' {      s.nextch()      if !s.escape('"') {        ok = false      }      continue    }    if s.ch == '\n' {      s.errorf("newline in string")      ok = false      break    }    if s.ch < 0 {      s.errorAtf(0, "string not terminated")      ok = false      break    }    s.nextch()  }
  s.setLit(StringLit, ok)}
func (s *scanner) rawString() {  ok := true  s.nextch()
  for {    if s.ch == '`' {      s.nextch()      break    }    if s.ch < 0 {      s.errorAtf(0, "string not terminated")      ok = false      break    }    s.nextch()  }  // We leave CRs in the string since they are part of the  // literal (even though they are not part of the literal  // value).
  s.setLit(StringLit, ok)}