OpenLookeng 连接器 -Clickhouse connector 性能测试报告
测试背景
OpenLooKeng 是一款开源的高性能数据虚拟化引擎,提供统一 SQL 接口,可以实现对多个数据库的跨源异构和跨域跨 DC 查询。
ClickHouse 是 Yandex 开源的一个用于实时数据分析的基于列存储的数据库,其工作速度比传统方法快 100-1000 倍,性能超过了目前市场上的列式存储数据库 4。
ClickHouse connector 是为 OpenLookeng 开发的用于访问 Clickhouse 数据源的连接器,截止测试时最新版本更新至 2020-12-24 日的b2162c5
,已支持对常见数据类型和函数的映射,详细支持列表可见https://gitee.com/heatao/hetu-core/blob/48650aa794c90a871df15661bf77f25225fd09c5/hetu-docs/zh/connector/clickhouse.md
测试目的
使用 SSB 测试基准提供的数据和 SQL 查询语句,对 OpenLookeng 的 ClickHouse connector 的性能进行评估。
测试环境
系统信息
硬件信息
部署情况
数据集
规模
按照 Clickhouse 官方文档中提到的 Star Schema 测试集生成数据[1]。
通过 ssb-dbgen 工具设置参数10
和1000
分别生成了两种规模的测试集,其规模如下,分别称为ssb-10
和ssb-1000
。
ssb-10
的数据占用空间为 5.18G,包含 59986052 条数据。
ssb-1000
占用的空间为 395.76G,数据量为 4398761522。
测试语句
SSB 测试语句总共包含 13 条测试语句,如下
Q1.1: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;
Q1.2: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;
Q1.3: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;
Q2.1: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year P_BRAND ORDER BY year P_BRAND;
Q2.2: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year P_BRAND ORDER BY year P_BRAND;
Q2.3: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year P_BRAND ORDER BY year P_BRAND;
Q3.1: SELECT C_NATION S_NATION toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BY C_NATION S_NATION year ORDER BY year ASC revenue DESC;
Q3.2: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;
Q3.3: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;
Q3.4: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;
Q4.1: SELECT toYear(LO_ORDERDATE) AS year C_NATION sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year C_NATION ORDER BY year ASC C_NATION ASC;
Q4.2: SELECT toYear(LO_ORDERDATE) AS year S_NATION P_CATEGORY sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year S_NATION P_CATEGORY ORDER BY year ASC S_NATION ASC P_CATEGORY ASC;
Q4.3: SELECT toYear(LO_ORDERDATE) AS year S_CITY P_BRAND sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year S_CITY P_BRAND ORDER BY year ASC S_CITY ASC P_BRAND ASC;
根据 OpenLookeng 的语法,在功能不变的情况下对 Clickhouse 语法的 sql 语句进行一定的修改,主要为一下几点:
将
toYear()
函数统一修改为year()
将
toISOWeek()
函数修改为week()
将
toYYYYMM(A)
替换为year(A)*100+month(A)
将语法为
select expr(A) as B group by B
等修改为select expr(A) as B group by expr(A)
,即去掉对 AS 的使用
修改后的语法为
Q1.1 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE year(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;
Q1.2 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE year(LO_ORDERDATE)*100+month(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;
Q1.3 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE week(LO_ORDERDATE) = 6 AND year(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;
Q2.1 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;
Q2.2 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;
Q2.3 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;
Q3.1 : SELECT C_NATION S_NATION year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_NATION S_NATION year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;
Q3.2 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;
Q3.3 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;
Q3.4 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year(LO_ORDERDATE)*100+month(LO_ORDERDATE) = 199712 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;
Q4.1 : SELECT year(LO_ORDERDATE) AS year C_NATION sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year(LO_ORDERDATE) C_NATION ORDER BY year(LO_ORDERDATE) ASC C_NATION ASC;
Q4.2 : SELECT year(LO_ORDERDATE) AS year S_NATION P_CATEGORY sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year(LO_ORDERDATE) = 1997 OR year(LO_ORDERDATE) = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year(LO_ORDERDATE) S_NATION P_CATEGORY ORDER BY year(LO_ORDERDATE) ASC S_NATION ASC P_CATEGORY ASC;
Q4.3 : SELECT year(LO_ORDERDATE) AS year S_CITY P_BRAND sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year(LO_ORDERDATE) = 1997 OR year(LO_ORDERDATE) = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year(LO_ORDERDATE) S_CITY P_BRAND ORDER BY year(LO_ORDERDATE) ASC S_CITY ASC P_BRAND ASC;
测试方法
将ssb-10
数据到 192.168.40.152 节点上,将ssb-1000
数据导入到 192.168.40.223 节点上
使用 python 脚本分别在 Clickhouse 和 OpenLookeng 上执行脚本,每条 SQL 语句执行间隔为 10 秒,多次执行取平均值。
其中 Clickhouse 的驱动使用的是开源库 clickhouse-driver[2],OpenLookeng 的驱动来源于对 web 端 API 的调用,代码已发布于 github[3]
测试结果
ssb-10
在 OpenLookeng 上的执行结果
在 Clickhouse 上的执行结果
ssb-1000
在 OpenLookeng 上的执行结果
后续测试可能因为缓存原因,导致执行速度大幅度加快,部分查询时间短于 Clickhouse
在 Clickhouse 上的执行结果
结果分析
ssb-10
对 sql 语句的平均时间进行对比,如下表,单位为秒
ssb-1000
对 sql 语句的平均时间进行对比,如下表
对可能使用缓存的测试结果的平均时间进行对比,如下表
结论后续
上述 SQL 查询结果显示通过 ClickHouse connector 来查询数据的方式相比直接使用 CH 的驱动仍有性能损耗,我们会和社区进一步交流,对原因进行分析并进行后续改进。
说明:OpenLooKeng 开源不久,仍然持续发展中,该测试目的是针对新开发的 ck connector 的测试,后续会进行 olk 内核和 connector 的深度联调,我们会持续关注 OLK 社区的发展,对新版本的 OLK 作出更全面的分析。
[1] https://github.com/vadimtk/ssb-dbgen
[2] https://github.com/mymarilyn/clickhouse-driver
[3] https://github.com/jiangph1001/OpenLookeng-driver
[4] https://clickhouse.tech/benchmark/dbms/
版权声明: 本文为 InfoQ 作者【Galaxy数据平台】的原创文章。
原文链接:【http://xie.infoq.cn/article/25b2a716e993c6c9106c12ba0】。文章转载请联系作者。
评论 (5 条评论)