写点什么

OpenLookeng 连接器 -Clickhouse connector 性能测试报告

发布于: 2021 年 01 月 10 日

测试背景


OpenLooKeng 是一款开源的高性能数据虚拟化引擎,提供统一 SQL 接口,可以实现对多个数据库的跨源异构和跨域跨 DC 查询。


ClickHouse 是 Yandex 开源的一个用于实时数据分析的基于列存储的数据库,其工作速度比传统方法快 100-1000 倍,性能超过了目前市场上的列式存储数据库 4。


ClickHouse connector 是为 OpenLookeng 开发的用于访问 Clickhouse 数据源的连接器,截止测试时最新版本更新至 2020-12-24 日的b2162c5,已支持对常见数据类型和函数的映射,详细支持列表可见https://gitee.com/heatao/hetu-core/blob/48650aa794c90a871df15661bf77f25225fd09c5/hetu-docs/zh/connector/clickhouse.md


测试目的


使用 SSB 测试基准提供的数据和 SQL 查询语句,对 OpenLookeng 的 ClickHouse connector 的性能进行评估。


测试环境


系统信息

硬件信息


部署情况


数据集

规模


按照 Clickhouse 官方文档中提到的 Star Schema 测试集生成数据[1]。

通过 ssb-dbgen 工具设置参数101000分别生成了两种规模的测试集,其规模如下,分别称为ssb-10ssb-1000



ssb-10的数据占用空间为 5.18G,包含 59986052 条数据。



ssb-1000占用的空间为 395.76G,数据量为 4398761522。


测试语句


SSB 测试语句总共包含 13 条测试语句,如下


Q1.1: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

Q1.2: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

Q1.3: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

Q2.1: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q2.2: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q2.3: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q3.1: SELECT C_NATION S_NATION toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BY C_NATION S_NATION year ORDER BY year ASC revenue DESC;

Q3.2: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q3.3: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q3.4: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q4.1: SELECT toYear(LO_ORDERDATE) AS year C_NATION sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year C_NATION ORDER BY year ASC C_NATION ASC;

Q4.2: SELECT toYear(LO_ORDERDATE) AS year S_NATION P_CATEGORY sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year S_NATION P_CATEGORY ORDER BY year ASC S_NATION ASC P_CATEGORY ASC;

Q4.3: SELECT toYear(LO_ORDERDATE) AS year S_CITY P_BRAND sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year S_CITY P_BRAND ORDER BY year ASC S_CITY ASC P_BRAND ASC;


根据 OpenLookeng 的语法,在功能不变的情况下对 Clickhouse 语法的 sql 语句进行一定的修改,主要为一下几点:

  • toYear()函数统一修改为year()

  • toISOWeek()函数修改为week()

  • toYYYYMM(A)替换为year(A)*100+month(A)

  • 将语法为select expr(A) as B group by B 等修改为select expr(A) as B group by expr(A),即去掉对 AS 的使用


修改后的语法为


Q1.1 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE year(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

Q1.2 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE year(LO_ORDERDATE)*100+month(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

Q1.3 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE week(LO_ORDERDATE) = 6 AND year(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

Q2.1 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q2.2 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q2.3 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q3.1 : SELECT C_NATION S_NATION year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_NATION S_NATION year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.2 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.3 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.4 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year(LO_ORDERDATE)*100+month(LO_ORDERDATE) = 199712 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q4.1 : SELECT year(LO_ORDERDATE) AS year C_NATION sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year(LO_ORDERDATE) C_NATION ORDER BY year(LO_ORDERDATE) ASC C_NATION ASC;

Q4.2 : SELECT year(LO_ORDERDATE) AS year S_NATION P_CATEGORY sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year(LO_ORDERDATE) = 1997 OR year(LO_ORDERDATE) = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year(LO_ORDERDATE) S_NATION P_CATEGORY ORDER BY year(LO_ORDERDATE) ASC S_NATION ASC P_CATEGORY ASC;

Q4.3 : SELECT year(LO_ORDERDATE) AS year S_CITY P_BRAND sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year(LO_ORDERDATE) = 1997 OR year(LO_ORDERDATE) = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year(LO_ORDERDATE) S_CITY P_BRAND ORDER BY year(LO_ORDERDATE) ASC S_CITY ASC P_BRAND ASC;

测试方法


ssb-10数据到 192.168.40.152 节点上,将ssb-1000数据导入到 192.168.40.223 节点上

使用 python 脚本分别在 Clickhouse 和 OpenLookeng 上执行脚本,每条 SQL 语句执行间隔为 10 秒,多次执行取平均值。

其中 Clickhouse 的驱动使用的是开源库 clickhouse-driver[2],OpenLookeng 的驱动来源于对 web 端 API 的调用,代码已发布于 github[3]

测试结果

ssb-10


在 OpenLookeng 上的执行结果


在 Clickhouse 上的执行结果

ssb-1000


在 OpenLookeng 上的执行结果



后续测试可能因为缓存原因,导致执行速度大幅度加快,部分查询时间短于 Clickhouse


在 Clickhouse 上的执行结果



结果分析

ssb-10

对 sql 语句的平均时间进行对比,如下表,单位为秒




ssb-1000


对 sql 语句的平均时间进行对比,如下表


对可能使用缓存的测试结果的平均时间进行对比,如下表



结论后续


上述 SQL 查询结果显示通过 ClickHouse connector 来查询数据的方式相比直接使用 CH 的驱动仍有性能损耗,我们会和社区进一步交流,对原因进行分析并进行后续改进。


说明:OpenLooKeng 开源不久,仍然持续发展中,该测试目的是针对新开发的 ck connector 的测试,后续会进行 olk 内核和 connector 的深度联调,我们会持续关注 OLK 社区的发展,对新版本的 OLK 作出更全面的分析。


[1]  https://github.com/vadimtk/ssb-dbgen

[2]  https://github.com/mymarilyn/clickhouse-driver

[3]  https://github.com/jiangph1001/OpenLookeng-driver

[4]  https://clickhouse.tech/benchmark/dbms/


发布于: 2021 年 01 月 10 日阅读数: 154
用户头像

还未添加个人签名 2018.04.24 加入

Galaxy 数据平台开发团队

评论 (5 条评论)

发布
用户头像
你好,ClickHouse connector查询数据性能有新的发展吗?我在openlookeng1.0.1版本集成了clickhouse,查询性能和你的测试报告一样,也是比较差的,对此,你对查询性能提升方面,有没有什么见解?
2021 年 03 月 11 日 10:15
回复
配置metadata缓存和连接池可能会有所提升,但olk的ch插件的查询性能问题暂时还没有好的解决方案,可以期待一下olk的新版本是否会带来提升
2021 年 03 月 11 日 11:41
回复
metadata缓存和连接池是openlookeng的配置吗?这个应该如何配置?
2021 年 03 月 11 日 17:31
回复
用户头像
请问openlookeng集成clinkhouse的源码可以开源吗?我想学习学习,非常感谢
2021 年 02 月 07 日 16:51
回复
文章中已经给出了gitee仓库的链接
2021 年 02 月 08 日 02:42
回复
没有更多了
OpenLookeng连接器-Clickhouse connector性能测试报告