Clickhouse 性能测试

用户头像
久吾尔岂
关注
发布于: 2020 年 05 月 06 日

Clickhouse 性能测试



ClickHouse简介



ClickHouse是战斗民族Yandex公司出品的OLAP开源数据库,简称CH,也有人简称CK,是目前市面上最快的



OLAP数据库。性能远超Vertica、Sybase IQ等。ClickHouse可能更适合流式或批次入库的时序数据。



CH具有以下几个特点:



  1. 列式存储,因此数据压缩比高。

  2. 向量计算,且支持多核CPU并行计算,并且执行每个SQL时都力求榨干CPU性能。

  3. 基于Shared nothing架构,支持分布式方案。

  4. 支持主从复制架构。

  5. 兼容大部分SQL语法,其语法和MySQL尤其相近。

  6. 数据实时更新。

  7. 不支持事务,不适合高频更新数据。

  8. 建议多用宽表,但不建议总是查询整数据行中的所有列。



简言之,如果你有以下业务场景,可以考虑用CH:



  1. 海量数据,但又不希望单节点的存储空间消耗太高。

  2. 宽表,为了业务方便,可能会把很多相关数据列都整合到一个表里。

  3. 基于SQL的查询方式,提高程序的适用性和可移植性。



性能测试



选用了CH官方提供的一个测试方案:SSBM (Star Schema Benchmark)



测试机配置



# 查看操作系统内核信息
[root@sdw1 third]# uname -a
Linux sdw1 3.10.0-957.el7.x8664 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x8664 x8664 x8664 GNU/Linux
# 查看CPU型号
[root@sdw1 third]# cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
20 Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
# 查看内存总量
[root@sdw1 third]# grep MemTotal /proc/meminfo
MemTotal: 32346828 kB



SSB模型介绍



SSB(Star Schema Benchmark)是麻省州立大学波士顿校区的研究人员定义的基于现实商业应用的数据模型,业界公认用来模拟决策支持类应用,比较公正和中立。



学术界和工业界普遍采用它来评价决策支持技术方面应用的性能。



全方位评测系统的整体商业计算综合能力,对厂商的要求更高。



在银行信贷分析和信用卡分析、电信运营分析、税收分析、烟草行业决策分析中都有广泛的应用。



SSB基准测试包括:



  • 1个事实表:lineorder

  • 4个维度表:customer,part,dwdate,supplier



13条标准SQL查询测试语句:统计查询、多表关联、sum、复杂条件、group by、order by等组合方式



![image-20200427174606665](/Users/7eng/Library/Application Support/typora-user-images/image-20200427174606665.png)



生成测试数据



# 下载SSBM工具
[root@sdw1 third]# git clone https://github.com/vadimtk/ssb-dbgen.git
[root@sdw1 third]# cd ssb-dbgen
[root@sdw1 dbgen]# make
# 生成测试数据,机器性能和磁盘有限,所以指定 -s 100
[root@sdw1 dbgen]# ./dbgen -s 100 -T c
[root@sdw1 dbgen]# ./dbgen -s 100 -T p
[root@sdw1 dbgen]# ./dbgen -s 100 -T s
[root@sdw1 dbgen]# ./dbgen -s 100 -T l
# 查看文件
[root@sdw1 dbgen]# ll .tbl
-rw-r--r-- 1 root root 289529327 4月 26 17:21 customer.tbl
-rw-r--r-- 1 root root 63289191180 4月 26 17:38 lineorder.tbl
-rw-r--r-- 1 root root 121042413 4月 26 17:21 part.tbl
-rw-r--r-- 1 root root 17062852 4月 26 17:21 supplier.tbl
[root@sdw1 dbgen]#
# 查看记录数
[root@sdw1 dbgen]# wc -l .tbl
3000000 customer.tbl
600037902 lineorder.tbl
1400000 part.tbl
200000 supplier.tbl



创建表



根据CH官网提供的建表DDL直接创建即可,参考这里:



[Star Schema Benchmark]:https://clickhouse.tech/docs/en/gettingstarted/exampledatasets/star_schema/



建表语句:



CREATE TABLE customer
(
CCUSTKEY UInt32,
CNAME String,
CADDRESS String,
CCITY LowCardinality(String),
CNATION LowCardinality(String),
CREGION LowCardinality(String),
CPHONE String,
CMKTSEGMENT LowCardinality(String)
)
ENGINE = MergeTree ORDER BY (CCUSTKEY);
CREATE TABLE lineorder
(
LOORDERKEY UInt32,
LOLINENUMBER UInt8,
LOCUSTKEY UInt32,
LOPARTKEY UInt32,
LOSUPPKEY UInt32,
LOORDERDATE Date,
LOORDERPRIORITY LowCardinality(String),
LOSHIPPRIORITY UInt8,
LOQUANTITY UInt8,
LOEXTENDEDPRICE UInt32,
LOORDTOTALPRICE UInt32,
LODISCOUNT UInt8,
LOREVENUE UInt32,
LOSUPPLYCOST UInt32,
LOTAX UInt8,
LOCOMMITDATE Date,
LOSHIPMODE LowCardinality(String)
)
ENGINE = MergeTree PARTITION BY toYear(LOORDERDATE) ORDER BY (LOORDERDATE, LOORDERKEY);
CREATE TABLE part
(
PPARTKEY UInt32,
PNAME String,
PMFGR LowCardinality(String),
PCATEGORY LowCardinality(String),
PBRAND LowCardinality(String),
PCOLOR LowCardinality(String),
PTYPE LowCardinality(String),
PSIZE UInt8,
PCONTAINER LowCardinality(String)
)
ENGINE = MergeTree ORDER BY PPARTKEY;
CREATE TABLE supplier
(
SSUPPKEY UInt32,
SNAME String,
SADDRESS String,
SCITY LowCardinality(String),
SNATION LowCardinality(String),
SREGION LowCardinality(String),
SPHONE String
)
ENGINE = MergeTree ORDER BY S_SUPPKEY;



导入数据



[root@sdw1 dbgen]# clickhouse-client --formatcsvdelimiter="|" --query "INSERT INTO customer FORMAT CSV" < customer.tbl
[root@sdw1 dbgen]# clickhouse-client --formatcsvdelimiter="|" --query "INSERT INTO part FORMAT CSV" < part.tbl
[root@sdw1 dbgen]# clickhouse-client --formatcsvdelimiter="|" --query "INSERT INTO supplier FORMAT CSV" < supplier.tbl
[root@sdw1 dbgen]# clickhouse-client --formatcsvdelimiter="|" --query "INSERT INTO lineorder FORMAT CSV" < lineorder.tbl
[root@sdw1 dbgen]# clickhouse-client --formatcsvdelimiter="|" --query "INSERT INTO lineorder FORMAT CSV" < lineorder.tbl



生成大宽表



SET maxmemoryusage = 20000000000;
CREATE TABLE lineorderflat
ENGINE = MergeTree
PARTITION BY toYear(LOORDERDATE)
ORDER BY (LOORDERDATE, LOORDERKEY) AS
SELECT
l.LOORDERKEY AS LOORDERKEY,
l.LOLINENUMBER AS LOLINENUMBER,
l.LOCUSTKEY AS LOCUSTKEY,
l.LOPARTKEY AS LOPARTKEY,
l.LOSUPPKEY AS LOSUPPKEY,
l.LOORDERDATE AS LOORDERDATE,
l.LOORDERPRIORITY AS LOORDERPRIORITY,
l.LOSHIPPRIORITY AS LOSHIPPRIORITY,
l.LOQUANTITY AS LOQUANTITY,
l.LOEXTENDEDPRICE AS LOEXTENDEDPRICE,
l.LOORDTOTALPRICE AS LOORDTOTALPRICE,
l.LODISCOUNT AS LODISCOUNT,
l.LOREVENUE AS LOREVENUE,
l.LOSUPPLYCOST AS LOSUPPLYCOST,
l.LOTAX AS LOTAX,
l.LOCOMMITDATE AS LOCOMMITDATE,
l.LOSHIPMODE AS LOSHIPMODE,
c.CNAME AS CNAME,
c.CADDRESS AS CADDRESS,
c.CCITY AS CCITY,
c.CNATION AS CNATION,
c.CREGION AS CREGION,
c.CPHONE AS CPHONE,
c.CMKTSEGMENT AS CMKTSEGMENT,
s.SNAME AS SNAME,
s.SADDRESS AS SADDRESS,
s.SCITY AS SCITY,
s.SNATION AS SNATION,
s.SREGION AS SREGION,
s.SPHONE AS SPHONE,
p.PNAME AS PNAME,
p.PMFGR AS PMFGR,
p.PCATEGORY AS PCATEGORY,
p.PBRAND AS PBRAND,
p.PCOLOR AS PCOLOR,
p.PTYPE AS PTYPE,
p.PSIZE AS PSIZE,
p.PCONTAINER AS PCONTAINER
FROM lineorder AS l
INNER JOIN customer AS c ON c.CCUSTKEY = l.LOCUSTKEY
INNER JOIN supplier AS s ON s.SSUPPKEY = l.LOSUPPKEY
INNER JOIN part AS p ON p.PPARTKEY = l.LOPARTKEY
Ok.
0 rows in set. Elapsed: 2049.310 sec. Processed 604.64 million rows, 26.12 GB (295.04 thousand rows/s., 12.75 MB/s.)
sdw1 :)



导入后结果



导完后表空间大小的数据



select table as "表名",
sum(rows) as "总行数",
formatReadableSize(sum(datauncompressedbytes)) as "原始大小",
formatReadableSize(sum(datacompressedbytes)) as "压缩大小",
round(sum(datacompressedbytes) / sum(datauncompressedbytes) * 100, 2) "压缩率"
from system.parts
group by table;



原文件以及表空间及压缩后大小

| 表 | 总行数 | tbl文件大小 | 原始大小 | 压缩大小 | 压缩率(%) | 最终压缩率 |

| :------------- | :-------- | ----------- | :------- | :------- | :-------- | ---------- |

| supplier | 200000 | 17 M | 11.06 M | 7.52 M | 68.02 | 44.24 |

| part | 1400000 | 116 M | 34.29 M | 24.08 M | 70.22 | 20.76 |

| customer | 3000000 | 277 M | 168.74 M | 114.63 M | 67.94 | 41.38 |

| lineorder | 600037902 | 59 G | 24.03 G | 16.67 G | 69.36 | 28.25 |

| lineorder_flat | 600037902 | | 97.03 G | 53.14 G | 54.77 | |



只看最大的lineorder表,对tbl文件的压缩比可以达到4:1,如果是相对常规的OLTP数据库,其压缩比显然还要更高。



查询测试



###### 查询结果比对



| SQL | 耗时(秒) | 扫描行数(10万) | 返回行数 |

| :--: | :------: | :------------: | :------: |

| Q1.1 | 4.170 | 91.01 | 1 |

| Q1.2 | 0.551 | 7.75 | 1 |

| Q1.3 | 0.143 | 1.80 | 1 |

| Q2.1 | 28.421 | 600.04 | 280 |

| Q2.2 | 1.294 | 600.04 | 56 |

| Q2.3 | 1.118 | 600.04 | 7 |

| Q3.1 | 9.061 | 546.67 | 150 |

| Q3.2 | 7.388 | 546.67 | 600 |

| Q3.3 | 2.092 | 546.67 | 24 |

| Q3.4 | 0.064 | 7.75 | 4 |

| Q4.1 | 14.875 | 600.04 | 35 |

| Q4.2 | 0.873 | 144.42 | 100 |

| Q4.3 | 0.838 | 144.42 | 800 |



每次扫描这么多数据量,但这些统计分析为主的SQL查询耗时却并不大,足见CH的计算性能了。



###### Q1.1:



SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue
FROM lineorderflat
WHERE (toYear(LOORDERDATE) = 1993) AND ((LODISCOUNT >= 1) AND (LODISCOUNT <= 3)) AND (LO_QUANTITY < 25)
┌────────revenue─┐
44652567249651
└────────────────┘
1 rows in set. Elapsed: 4.170 sec. Processed 91.01 million rows, 728.06 MB (21.82 million rows/s., 174.59 MB/s.)
sdw1 :)



###### Q1.2:



SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue
FROM lineorderflat
WHERE (toYYYYMM(LOORDERDATE) = 199401) AND ((LODISCOUNT >= 4) AND (LODISCOUNT <= 6)) AND ((LOQUANTITY >= 26) AND (LOQUANTITY <= 35))
┌───────revenue─┐
9624332170119
└───────────────┘
1 rows in set. Elapsed: 0.551 sec. Processed 7.75 million rows, 61.96 MB (14.06 million rows/s., 112.49 MB/s.)
sdw1 :)



###### Q1.3:



SELECT sum(LOEXTENDEDPRICE * LODISCOUNT) AS revenue
FROM lineorderflat
WHERE (toISOWeek(LOORDERDATE) = 6) AND (toYear(LOORDERDATE) = 1994) AND ((LODISCOUNT >= 5) AND (LODISCOUNT <= 7)) AND ((LOQUANTITY >= 26) AND (LO_QUANTITY <= 35))
┌───────revenue─┐
2611093671163
└───────────────┘
1 rows in set. Elapsed: 0.143 sec. Processed 1.80 million rows, 14.42 MB (12.58 million rows/s., 100.67 MB/s.)
sdw1 :)



###### Q2.1:



SELECT
sum(LOREVENUE),
toYear(LOORDERDATE) AS year,
PBRAND
FROM lineorderflat
WHERE (PCATEGORY = 'MFGR#12') AND (SREGION = 'AMERICA')
GROUP BY
year,
PBRAND
ORDER BY
year ASC,
PBRAND ASC
┌─sum(LOREVENUE)─┬─year─┬─PBRAND───┐
644200056181992 │ MFGR#121 │
633893460961992 │ MFGR#1210 │
684166056371992 │ MFGR#1211 │
641357232641992 │ MFGR#1212 │
663313700551992 │ MFGR#1213 │
688458755471992 │ MFGR#1214 │
669044653161992 │ MFGR#1215 │
703198700471992 │ MFGR#1216 │
682675275901992 │ MFGR#1217 │
669263672801992 │ MFGR#1218 │
396798929151998 │ MFGR#128 │
│ ******** │ │ ***** │
353005130831998 │ MFGR#129 │
└─────────────────┴──────┴───────────┘
280 rows in set. Elapsed: 28.421 sec. Processed 600.04 million rows, 6.17 GB (21.11 million rows/s., 217.04 MB/s.)
sdw1 :)



###### Q2.2:



SELECT
sum(LOREVENUE),
toYear(LOORDERDATE) AS year,
PBRAND
FROM lineorderflat
WHERE (PBRAND >= 'MFGR#2221') AND (PBRAND <= 'MFGR#2228') AND (SREGION = 'ASIA')
GROUP BY
year,
PBRAND
ORDER BY
year ASC,
PBRAND ASC
┌─sum(LOREVENUE)─┬─year─┬─P_BRAND───┐
664503494381992 │ MFGR#2221 │
654232643121992 │ MFGR#2222 │
669367726871992 │ MFGR#2223 │
640471919341992 │ MFGR#2224 │
657445591381992 │ MFGR#2225 │
669930456681992 │ MFGR#2226 │
383284238981998 │ MFGR#2225 │
387050332721998 │ MFGR#2226 │
│ ******** │ │ ***** │
406542018401998 │ MFGR#2228 │
└─────────────────┴──────┴───────────┘
56 rows in set. Elapsed: 1.294 sec. Processed 600.04 million rows, 5.56 GB (463.67 million rows/s., 4.30 GB/s.)
sdw1 :)



###### Q2.3:



SELECT
sum(LOREVENUE),
toYear(LOORDERDATE) AS year,
PBRAND
FROM lineorderflat
WHERE (PBRAND = 'MFGR#2239') AND (SREGION = 'EUROPE')
GROUP BY
year,
PBRAND
ORDER BY
year ASC,
PBRAND ASC
┌─sum(LOREVENUE)─┬─year─┬─PBRAND───┐
657515897231992 │ MFGR#2239 │
645328448011993 │ MFGR#2239 │
647225990021994 │ MFGR#2239 │
656164326831995 │ MFGR#2239 │
648028846861996 │ MFGR#2239 │
644855411651997 │ MFGR#2239 │
372765363611998 │ MFGR#2239 │
└─────────────────┴──────┴───────────┘
7 rows in set. Elapsed: 1.118 sec. Processed 600.04 million rows, 5.56 GB (536.63 million rows/s., 4.98 GB/s.)
sdw1 :)



###### Q3.1:



SELECT
CNATION,
SNATION,
toYear(LOORDERDATE) AS year,
sum(LOREVENUE) AS revenue
FROM lineorderflat
WHERE (CREGION = 'ASIA') AND (SREGION = 'ASIA') AND (year >= 1992) AND (year <= 1997)
GROUP BY
CNATION,
SNATION,
year
ORDER BY
year ASC,
revenue DESC
┌─CNATION──┬─S_NATION──┬─year─┬──────revenue─┐
│ INDIA │ INDIA │ 1992537778456208
│ INDONESIA │ INDIA │ 1992536684093041
│ VIETNAM │ INDIA │ 1992536483529614
│ INDIA │ JAPAN │ 1992535663357352
│ INDONESIA │ JAPAN │ 1992535044240518
│ INDIA │ INDONESIA │ 1992534847378548
│ VIETNAM │ CHINA │ 1997529209006663
│ ****** │ *** │ │ ********* │
│ JAPAN │ VIETNAM │ 1997525495763677
└───────────┴───────────┴──────┴──────────────┘
150 rows in set. Elapsed: 9.061 sec. Processed 546.67 million rows, 5.48 GB (60.33 million rows/s., 604.27 MB/s.)
sdw1 :)



###### Q3.2:



SELECT
CCITY,
SCITY,
toYear(LOORDERDATE) AS year,
sum(LOREVENUE) AS revenue
FROM lineorderflat
WHERE (CNATION = 'UNITED STATES') AND (SNATION = 'UNITED STATES') AND (year >= 1992) AND (year <= 1997)
GROUP BY
CCITY,
SCITY,
year
ORDER BY
year ASC,
revenue DESC
┌─CCITY─────┬─S_CITY─────┬─year─┬────revenue─┐
│ UNITED ST6 │ UNITED ST6 │ 19925694246807
│ UNITED ST0 │ UNITED ST0 │ 19925676049026
│ UNITED ST1 │ UNITED ST1 │ 19925652630617
│ UNITED ST8 │ UNITED ST6 │ 19925649039075
│ UNITED ST4 │ UNITED ST1 │ 19925618014301
│ UNITED ST4 │ UNITED ST4 │ 19925601463754
│ UNITED ST6 │ UNITED ST0 │ 19925589878255
│ UNITED ST9 │ UNITED ST6 │ 19974877577655
│ UNITED ST3 │ UNITED ST2 │ 19974866105481
│ ******* │ │ │ ******* │
│ UNITED ST9 │ UNITED ST5 │ 19974769919410
└────────────┴────────────┴──────┴────────────┘
600 rows in set. Elapsed: 7.388 sec. Processed 546.67 million rows, 5.55 GB (74.00 million rows/s., 751.69 MB/s.)
sdw1 :)



###### Q3.3:



SELECT
CCITY,
SCITY,
toYear(LOORDERDATE) AS year,
sum(LOREVENUE) AS revenue
FROM lineorderflat
WHERE ((CCITY = 'UNITED KI1') OR (CCITY = 'UNITED KI5')) AND ((SCITY = 'UNITED KI1') OR (SCITY = 'UNITED KI5')) AND (year >= 1992) AND (year <= 1997)
GROUP BY
CCITY,
SCITY,
year
ORDER BY
year ASC,
revenue DESC
┌─CCITY─────┬─S_CITY─────┬─year─┬────revenue─┐
│ UNITED KI1 │ UNITED KI1 │ 19925776096629
│ UNITED KI5 │ UNITED KI1 │ 19925555883901
│ UNITED KI5 │ UNITED KI5 │ 19925348705805
│ UNITED KI1 │ UNITED KI5 │ 19925326870427
│ UNITED KI1 │ UNITED KI1 │ 19935892974670
│ UNITED KI1 │ UNITED KI5 │ 19935490859451
│ UNITED KI5 │ UNITED KI1 │ 19935468354303
│ UNITED KI5 │ UNITED KI5 │ 19935089909647
│ UNITED KI5 │ UNITED KI1 │ 19945437315108
│ UNITED KI1 │ UNITED KI1 │ 19945348775917
│ UNITED KI5 │ UNITED KI5 │ 19945310936695
│ UNITED KI1 │ UNITED KI5 │ 19945237461110
│ UNITED KI1 │ UNITED KI1 │ 19955737551920
│ UNITED KI5 │ UNITED KI5 │ 19955657584590
│ UNITED KI5 │ UNITED KI1 │ 19955260093556
│ UNITED KI1 │ UNITED KI5 │ 19955213763257
│ UNITED KI5 │ UNITED KI1 │ 19965522325005
│ UNITED KI1 │ UNITED KI1 │ 19965451244409
│ UNITED KI5 │ UNITED KI5 │ 19965231759057
│ UNITED KI1 │ UNITED KI5 │ 19965203962897
│ UNITED KI1 │ UNITED KI1 │ 19975340760807
│ UNITED KI5 │ UNITED KI1 │ 19975295685214
│ UNITED KI1 │ UNITED KI5 │ 19975188428156
│ UNITED KI5 │ UNITED KI5 │ 19975024634475
└────────────┴────────────┴──────┴────────────┘
24 rows in set. Elapsed: 2.092 sec. Processed 546.67 million rows, 4.45 GB (261.37 million rows/s., 2.13 GB/s.)



###### Q3.4:



SELECT
CCITY,
SCITY,
toYear(LOORDERDATE) AS year,
sum(LOREVENUE) AS revenue
FROM lineorderflat
WHERE ((CCITY = 'UNITED KI1') OR (CCITY = 'UNITED KI5')) AND ((SCITY = 'UNITED KI1') OR (SCITY = 'UNITED KI5')) AND (toYYYYMM(LOORDERDATE) = 199712)
GROUP BY
CCITY,
SCITY,
year
ORDER BY
year ASC,
revenue DESC
┌─CCITY─────┬─SCITY─────┬─year─┬───revenue─┐
│ UNITED KI1 │ UNITED KI1 │ 1997481119563
│ UNITED KI5 │ UNITED KI5 │ 1997386477033
│ UNITED KI5 │ UNITED KI1 │ 1997378048353
│ UNITED KI1 │ UNITED KI5 │ 1997366630529
└────────────┴────────────┴──────┴───────────┘
4 rows in set. Elapsed: 0.064 sec. Processed 7.75 million rows, 63.16 MB (121.26 million rows/s., 988.25 MB/s.)
sdw1 :)



###### Q4.1:



SELECT
toYear(LOORDERDATE) AS year,
CNATION,
sum(LOREVENUE - LOSUPPLYCOST) AS profit
FROM lineorderflat
WHERE (CREGION = 'AMERICA') AND (SREGION = 'AMERICA') AND ((PMFGR = 'MFGR#1') OR (PMFGR = 'MFGR#2'))
GROUP BY
year,
CNATION
ORDER BY
year ASC,
CNATION ASC
┌─year─┬─CNATION──────┬────────profit─┐
1992 │ ARGENTINA │ 1041983042066
1992 │ BRAZIL │ 1031193572794
1992 │ CANADA │ 1032094614252
1992 │ PERU │ 1037331491440
1992 │ UNITED STATES │ 1031593944407
1993 │ ARGENTINA │ 1034515265588
1993 │ BRAZIL │ 1028249774691
1998 │ BRAZIL │ 603999739074
1998 │ CANADA │ 601462066533
│ * │ │ ********* │
1998 │ UNITED STATES │ 605069471323
└──────┴───────────────┴───────────────┘
35 rows in set. Elapsed: 14.875 sec. Processed 600.04 million rows, 8.41 GB (40.34 million rows/s., 565.19 MB/s.)
sdw1 :)



###### Q4.2:



SELECT
toYear(LOORDERDATE) AS year,
SNATION,
PCATEGORY,
sum(LOREVENUE - LOSUPPLYCOST) AS profit
FROM lineorderflat
WHERE (CREGION = 'AMERICA') AND (SREGION = 'AMERICA') AND ((year = 1997) OR (year = 1998)) AND ((PMFGR = 'MFGR#1') OR (PMFGR = 'MFGR#2'))
GROUP BY
year,
SNATION,
PCATEGORY
ORDER BY
year ASC,
SNATION ASC,
PCATEGORY ASC
┌─year─┬─SNATION──────┬─PCATEGORY─┬───────profit─┐
1997 │ ARGENTINA │ MFGR#11 │ 102369950215 │
1997 │ ARGENTINA │ MFGR#12 │ 103052774082 │
1997 │ ARGENTINA │ MFGR#13 │ 103202870567 │
1997 │ ARGENTINA │ MFGR#14 │ 101773547534 │
1997 │ ARGENTINA │ MFGR#15 │ 103728199319 │
1998 │ UNITED STATES │ MFGR#14 │ 60513162474 │
1998 │ UNITED STATES │ MFGR#15 │ 60765251425 │
1998 │ UNITED STATES │ MFGR#21 │ 60285728251 │
1998 │ UNITED STATES │ MFGR#22 │ 59636033602 │
1998 │ UNITED STATES │ MFGR#23 │ 61533867289 │
│ * │ * │ │ **** │
1998 │ UNITED STATES │ MFGR#25 │ 60042710566 │
└──────┴───────────────┴────────────┴──────────────┘
100 rows in set. Elapsed: 0.873 sec. Processed 144.42 million rows, 2.17 GB (165.38 million rows/s., 2.48 GB/s.)
sdw1 :)



###### Q4.3:



SELECT
toYear(LOORDERDATE) AS year,
SCITY,
PBRAND,
sum(LOREVENUE - LOSUPPLYCOST) AS profit
FROM lineorderflat
WHERE (SNATION = 'UNITED STATES') AND ((year = 1997) OR (year = 1998)) AND (PCATEGORY = 'MFGR#14')
GROUP BY
year,
SCITY,
PBRAND
ORDER BY
year ASC,
SCITY ASC,
PBRAND ASC
┌─year─┬─SCITY─────┬─PBRAND───┬─────profit─┐
1997 │ UNITED ST0 │ MFGR#141 │ 1402715668 │
1997 │ UNITED ST0 │ MFGR#1410 │ 1309800423 │
1997 │ UNITED ST0 │ MFGR#1411 │ 1252501939 │
1997 │ UNITED ST0 │ MFGR#1412 │ 1295391924 │
1997 │ UNITED ST0 │ MFGR#1413 │ 1294775484 │
1998 │ UNITED ST9 │ MFGR#1438 │ 671629479 │
1998 │ UNITED ST9 │ MFGR#1439 │ 699887270 │
1998 │ UNITED ST9 │ MFGR#144 │ 626238164 │
1998 │ UNITED ST9 │ MFGR#146 │ 770184207 │
1998 │ UNITED ST9 │ MFGR#147 │ 640908455 │
│ * │ * │ │ **** │
1998 │ UNITED ST9 │ MFGR#149 │ 603099066 │
└──────┴────────────┴───────────┴────────────┘
800 rows in set. Elapsed: 0.838 sec. Processed 144.42 million rows, 2.24 GB (172.35 million rows/s., 2.67 GB/s.)
sdw1 :)



相关优化



  1. 关闭虚拟内存,物理内存和虚拟内存的数据交换,会导致查询变慢。



  1. 为每一个账户添加joinusenulls配置,左表中的一条记录在右表中不存在,右表的相应字段会返回该字段相应数据类型的默认值,而不是标准SQL中的Null值。



  1. JOIN操作时一定要把数据量小的表放在右边,ClickHouse中无论是Left Join 、Right Join还是Inner Join永远都是拿着右表中的每一条记录到左表中查找该记录是否存在,所以右表必须是小表。



  1. 批量写入数据时,必须控制每个批次的数据中涉及到的分区的数量,在写入之前最好对需要导入的数据进行排序。无序的数据或者涉及的分区太多,会导致ClickHouse无法及时对新导入的数据进行合并,从而影响查询性能。



  1. 尽量减少JOIN时的左右表的数据量,必要时可以提前对某张表进行聚合操作,减少数据条数。有些时候,先GROUP BY再JOIN比先JOIN再GROUP BY查询时间更短。



  1. ClickHouse的分布式表性能性价比不如物理表高,建表分区字段值不宜过多,防止数据导入过程磁盘可能会被打满。



  1. CPU一般在50%左右会出现查询波动,达到70%会出现大范围的查询超时,CPU是最关键的指标,要非常关注。



用户头像

久吾尔岂

关注

还未添加个人签名 2017.09.15 加入

还未添加个人简介

评论

发布
暂无评论
Clickhouse 性能测试