写点什么

openGauss 内核:SQL 解析过程分析

  • 2022 年 6 月 28 日
  • 本文字数:4112 字

    阅读完需:约 13 分钟

openGauss内核:SQL解析过程分析

本文分享自华为云社区《 openGauss内核分析(三):SQL解析》,作者:Gauss 松鼠会。


在传统数据库中 SQL 引擎一般指对用户输入的 SQL 语句进行解析、优化的软件模块。


SQL 的解析过程主要分为:


• 词法分析:将用户输入的 SQL 语句拆解成单词(Token)序列,并识别出关键字、标识、常量等。

• 语法分析:分析器对词法分析器解析出来的单词(Token)序列在语法上是否满足 SQL 语法规则。

• 语义分析:语义分析是 SQL 解析过程的一个逻辑阶段,主要任务是在语法正确的基础上进行上下文有关性质的审查,在 SQL 解析过程中该阶段完成表名、操作符、类型等元素的合法性判断,同时检测语义上的二义性。


openGauss 在 pg_parse_query 中调用 raw_parser 函数对用户输入的 SQL 命令进行词法分析和语法分析,生成语法树添加到链表 parsetree_list 中。完成语法分析后,对于 parsetree_list 中的每一颗语法树 parsetree,会调用 parse_**yze 函数进行语义分析,根据 SQL 命令的不同,执行对应的入口函数,最终生成查询树。


词法分析


openGauss 使用 flex 工具进行词法分析。flex 工具通过对已经定义好的词法文件进行编译,生成词法分析的代码。词法文件是 scan.l,它根据 SQL 语言标准对 SQL 语言中的关键字、标识符、操作符、常量、终结符进行了定义和识别。在 kwlist.h 中定义了大量的关键字,按照字母的顺序排列,方便在查找关键字时通过二分法进行查找。在 scan.l 中处理"标识符"时,会到关键字列表中进行匹配,如果一个标识符匹配到关键字,则认为是关键字,否则才是标识符,即关键字优先.以“select a, b from item”为例说明词法分析结果。


语法分析


openGauss 中定义了 bison 工具能够识别的语法文件 gram.y,根据 SQL 语言的不同定义了一系列表达 Statement 的结构体(这些结构体通常以 Stmt 作为命名后缀),用来保存语法分析结果。以 SELECT 查询为例,它对应的 Statement 结构体如下。


typedef struct SelectStmt{	NodeTag		type;	List	   *distinctClause; /* NULL, list of DISTINCT ON exprs, or								 * lcons(NIL,NIL) for all (SELECT DISTINCT) */	IntoClause *intoClause;		/* target for SELECT INTO */	List	   *targetList;		/* the target list (of ResTarget) */	List	   *fromClause;		/* the FROM clause */	Node	   *whereClause;	/* WHERE qualification */	List	   *groupClause;	/* GROUP BY clauses */	Node	   *havingClause;	/* HAVING conditional-expression */	List	   *windowClause;	/* WINDOW window_name AS (...), ... */	WithClause *withClause;		/* WITH clause */	List	   *valuesLists;	/* untransformed list of expression lists */	List	   *sortClause;		/* sort clause (a list of SortBy's) */	Node	   *limitOffset;	/* # of result tuples to skip */	Node	   *limitCount;		/* # of result tuples to return */    ……} SelectStmt;
复制代码


这个结构体可以看作一个多叉树,每个叶子节点都表达了 SELECT 查询语句中的一个语法结构,对应到 gram.y 中,它会有一个 SelectStmt。代码如下:



从 simple_select 语法分析结构可以看出,一条简单的查询语句由以下子句组成:去除行重复的 distinctClause、目标属性 targetList、SELECT INTO 子句 intoClause、FROM 子句 fromClause、WHERE 子句 whereClause、GROUP BY BY 子句 groupClause、HAVING 子句 havingClause、窗口子句窗口 Clause 和 plan_hint 子句。在成功匹配 simple_select 语法结构后,将会创建一个 Statement 结构体,将各个子句进行相应的赋值。对 simple_select 而言,目标属性、FROM 子句、WHERE 子句是最重要的组成部分。SelectStmt 与其他结构体的关系如下:



下面以“select a, b from item”为例说明简单 select 语句的解析过程,函数 exec_simple_query 调用 pg_parse_query 执行解析,解析树中只有一个元素。



(gdb) p *parsetree_list$47 = {type = T_List, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}
复制代码


List 中的节点类型为 T_SelectStmt。


(gdb) p *(Node *)(parsetree_list->head.data->ptr_value)$45 = {type = T_SelectStmt}
复制代码


查看 SelectStmt 结构体,targetList 和 fromClause 非空。


(gdb) set $stmt = (SelectStmt *)(parsetree_list->head.data->ptr_value)(gdb) p *$stmt$50 = {type = T_SelectStmt, distinctClause = 0x0, intoClause = 0x0, targetList = 0x7f5ffa43d588, fromClause = 0x7f5ff986c888, startWithClause = 0x0, whereClause = 0x0, groupClause = 0x0,  havingClause = 0x0, windowClause = 0x0, withClause = 0x0, valuesLists = 0x0, sortClause = 0x0, limitOffset = 0x0, limitCount = 0x0, lockingClause = 0x0, hintState = 0x0, op = SETOP_NONE, all = false,  larg = 0x0, rarg = 0x0, hasPlus = false}
复制代码


查看 SelectStmt 的 targetlist,有两个 ResTarget。


(gdb) p *($stmt->targetList)$55 = {type = T_List, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}(gdb) p *(Node *)($stmt->targetList->head.data->ptr_value)$57 = {type = T_ResTarget}
(gdb) set $restarget1=(ResTarget *)($stmt->targetList->head.data->ptr_value)(gdb) p *$restarget1$60 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d378, location = 7}(gdb) p *$restarget1->val$63 = {type = T_ColumnRef}(gdb) p *(ColumnRef *)$restarget1->val$64 = {type = T_ColumnRef, fields = 0x7f5ffa43d470, prior = false, indnum = 0, location = 7}(gdb) p *((ColumnRef *)$restarget1->val)->fields$66 = {type = T_List, length = 1, head = 0x7f5ffa43d428, tail = 0x7f5ffa43d428}(gdb) p *(Node *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value$67 = {type = T_String}(gdb) p *(Value *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value$77 = {type = T_String, val = {ival = 140050197369648, str = 0x7f5ffa43d330 "a"}}(gdb) set $restarget2=(ResTarget *)($stmt->targetList->tail.data->ptr_value)(gdb) p *$restarget2$89 = {type = T_ResTarget, name = 0x0, indirection = 0x0, val = 0x7f5ffa43d638, location = 10}(gdb) p *$restarget2->val$90 = {type = T_ColumnRef}(gdb) p *(ColumnRef *)$restarget2->val$91 = {type = T_ColumnRef, fields = 0x7f5ffa43d730, prior = false, indnum = 0, location = 10}(gdb) p *((ColumnRef *)$restarget2->val)->fields$92 = {type = T_List, length = 1, head = 0x7f5ffa43d6e8, tail = 0x7f5ffa43d6e8}(gdb) p *(Node *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value$93 = {type = T_String}(gdb) p *(Value *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value$94 = {type = T_String, val = {ival = 140050197370352, str = 0x7f5ffa43d5f0 "b"}}
复制代码


查看 SelectStmt 的 fromClause,有一个 RangeVar。


(gdb) p *$stmt->fromClause$102 = {type = T_List, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}(gdb) set $fromclause=(RangeVar*)($stmt->fromClause->head.data->ptr_value)(gdb) p *$fromclause$103 = {type = T_RangeVar, catalogname = 0x0, schemaname = 0x0, relname = 0x7f5ffa43d848 "item", partitionname = 0x0, subpartitionname = 0x0, inhOpt = INH_DEFAULT, relpersistence = 112 'p', alias = 0x0,  location = 17, ispartition = false, issubpartition = false, partitionKeyValuesList = 0x0, isbucket = false, buckets = 0x0, length = 0, foreignOid = 0, withVerExpr = false}
复制代码


综合以上分析可以得到语法树结构。


语义分析


在完成词法分析和语法分析后,parse_Ana lyze 函数会根据语法树的类型,调用 transformSelectStmt 将 parseTree 改写为查询树。



(gdb) p *result$3 = {type = T_Query, commandType = CMD_SELECT, querySource = QSRC_ORIGINAL, queryId = 0, canSetTag = false, utilityStmt = 0x0, resultRelation = 0, hasAggs = false, hasWindowFuncs = false,  hasSubLinks = false, hasDistinctOn = false, hasRecursive = false, hasModifyingCTE = false, hasForUpdate = false, hasRowSecurity = false, hasSynonyms = false, cteList = 0x0, rtable = 0x7f5ff5eb8c88,  jointree = 0x7f5ff5eb9310, targetList = 0x7f5ff5eb9110,…}
(gdb) p *result->targetList$13 = {type = T_List, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}
(gdb) p *(Node *)(result->targetList->head.data->ptr_value)$8 = {type = T_TargetEntry}(gdb) p *(TargetEntry*)(result->targetList->head.data->ptr_value)$9 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}(gdb) p *(TargetEntry*)(result->targetList->tail.data->ptr_value)$10 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}(gdb)(gdb) p *result->rtable$14 = {type = T_List, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}(gdb) p *(Node *)(result->rtable->head.data->ptr_value)$15 = {type = T_RangeTblEntry}(gdb) p *(RangeTblEntry*)(result->rtable->head.data->ptr_value)$16 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relname = 0x7f5ff636efb0 "item", partAttrNum = 0x0, relid = 24576, partitionOid = 0, isContainPartition = false, subpartitionOid = 0……}
复制代码


得到的查询树结构如下:



完成词法、语法和语义分析后,SQL 解析过程完成,SQL 引擎开始执行查询优化,在下一期中再具体分析。


点击关注,第一时间了解华为云新鲜技术~

发布于: 刚刚阅读数: 5
用户头像

提供全面深入的云计算技术干货 2020.07.14 加入

华为云开发者社区,提供全面深入的云计算前景分析、丰富的技术干货、程序样例,分享华为云前沿资讯动态,方便开发者快速成长与发展,欢迎提问、互动,多方位了解云计算! 传送门:https://bbs.huaweicloud.com/

评论

发布
暂无评论
openGauss内核:SQL解析过程分析_数据库_华为云开发者联盟_InfoQ写作社区