设计 | ClickHouse 分布式表实现数据同步

关注

发布于: 9 小时前

作者：吴帆青云数据库团队成员
主要负责维护 MySQL 及 ClickHouse 产品开发，擅长故障分析，性能优化。

在多副本分布式 ClickHouse 集群中，通常需要使用 Distributed 表写入或读取数据，Distributed 表引擎自身不存储任何数据，它能够作为分布式表的一层透明代理，在集群内部自动开展数据的写入、分发、查询、路由等工作。

Distributed 表实现副本数据同步有两种方案：

Distributed + MergeTree
Distributed + ReplicateMergeTree

| Distributed + MergeTree

在使用这种方案时 internal_replication 需要设为 false，向 Distributed 表写入数据，Distributed 表会将数据写入集群内的每个副本。Distributed 节点需要负责所有分片和副本的数据写入工作。

1. 集群配置

<logical_consistency_cluster>    <shard>        <internal_replication>false</internal_replication>        <replica>            <host>shard1-repl1</host>            <port>9000</port>        </replica>        <replica>            <host>shard1-repl2</host>            <port>9000</port>        </replica>    </shard></logical_consistency_cluster>

复制代码

2. 数据写入

CREATE TABLE test.t_local  on cluster logical_consistency_cluster(    EventDate DateTime,    CounterID UInt32,    UserID UInt32) ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate) ;
CREATE TABLE test.t_logical_Distributed on cluster logical_consistency_cluster(    EventDate DateTime,    CounterID UInt32,    UserID UInt32)ENGINE = Distributed(logical_consistency_cluster, test, t_local, CounterID) ;
INSERT INTO test.t_logical_Distributed VALUES ('2019-01-16 00:00:00', 1, 1),('2019-02-10 00:00:00',2, 2),('2019-03-10 00:00:00',3, 3)

复制代码

3. 数据查询

# shard1-repl1
SELECT *FROM test.t_local
Query id: bd031554-b1e0-4fda-9ff8-1145ffae5b02
┌───────────EventDate──┬─CounterID─┬─UserID─┐│ 2019-03-10 00:00:00 │         3 │      3 │└─────────────────────┴───────────┴────────┘┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-02-10 00:00:00 │         2 │      2 │└─────────────────────┴───────────┴────────┘┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-01-16 00:00:00 │         1 │      1 │└─────────────────────┴───────────┴────────┘
3 rows in set. Elapsed: 0.004 sec. 
------------------------------------------
# shard1-repl2
SELECT *FROM test.t_local
Query id: 636f7580-02e0-4279-bc9b-1f153c0473dc
┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-01-16 00:00:00 │         1 │      1 │└─────────────────────┴───────────┴────────┘┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-03-10 00:00:00 │         3 │      3 │└─────────────────────┴───────────┴────────┘┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-02-10 00:00:00 │         2 │      2 │└─────────────────────┴───────────┴────────┘
3 rows in set. Elapsed: 0.005 sec.

复制代码

通过写入测试我们可以看到每个副本数据是一致的。

即使本地表不使用 ReplicatedMergeTree 表引擎，也能实现数据副本的功能。但每个副本的数据是通过 Distributed 表独立写入，文件存储格式不会完全一致，可以理解这种方式为逻辑一致性。

Distributed 需要同时负责分片和副本的数据写入工作，单点写入很有可能会成为系统性能的瓶颈，所有有接下来的第二种方案。

| Distributed + ReplicateMergeTree

在使用这种方案时 internal_replication 需要设为 true，向 Distributed 表写入数据。Distributed 表在每个分片中选择一个合适的副本并对其写入数据。

分片内多个副本之间的数据复制会由 ReplicatedMergeTree 自己处理，不再由 Distributed 负责。

1. 配置文件

<physical_consistency_cluster>    <shard>        <internal_replication>true</internal_replication>        <replica>            <host>shard1-repl1</host>            <port>9000</port>        </replica>        <replica>            <host>shard1-repl2</host>            <port>9000</port>        </replica>    </shard></physical_consistency_cluster>

复制代码

2. 数据写入

CREATE TABLE test.t_local on cluster  physical_consistency_cluster (    EventDate DateTime,    CounterID UInt32,    UserID UInt32)ENGINE = ReplicatedMergeTree('{namespace}/test/t_local', '{replica}')PARTITION BY toYYYYMM(EventDate)ORDER BY (CounterID, EventDate, intHash32(UserID))SAMPLE BY intHash32(UserID);


CREATE TABLE test.t_physical_Distributed on cluster physical_consistency_cluster(    EventDate DateTime,    CounterID UInt32,    UserID UInt32)ENGINE = Distributed(physical_consistency_cluster, test, t_local, CounterID);
INSERT INTO test.t_physical_Distributed VALUES ('2019-01-16 00:00:00', 1, 1),('2019-02-10 00:00:00',2, 2),('2019-03-10 00:00:00',3, 3)

复制代码

3. 数据查询

# shard1-repl1
SELECT *FROM test.t_local
Query id: d2bafd2d-d0a8-41b4-8d79-ece37e8159e5
┌───────────EventDate──┬─CounterID─┬─UserID─┐│ 2019-03-10 00:00:00 │         3 │      3 │└─────────────────────┴───────────┴────────┘┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-02-10 00:00:00 │         2 │      2 │└─────────────────────┴───────────┴────────┘┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-01-16 00:00:00 │         1 │      1 │└─────────────────────┴───────────┴────────┘
3 rows in set. Elapsed: 0.004 sec. 
------------------------------------------
# shard1-repl2
SELECT *FROM test.t_local
Query id: b5f0dc80-f73f-427e-b04e-e5b787876462
┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-01-16 00:00:00 │         1 │      1 │└─────────────────────┴───────────┴────────┘┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-03-10 00:00:00 │         3 │      3 │└─────────────────────┴───────────┴────────┘┌───────────EventDate─┬─CounterID─┬─UserID─┐│ 2019-02-10 00:00:00 │         2 │      2 │└─────────────────────┴───────────┴────────┘
3 rows in set. Elapsed: 0.005 sec.

复制代码

ReplicatedMergeTree 需要依靠 ZooKeeper 的事件监听机制以实现各个副本之间的协同，副本协同的核心流程主要有：INSERT、MERGE、MUTATION 和 ALTER 四种。

通过写入测试我们可以看到每个副本数据也是一致的，副本之间依靠 ZooKeeper 同步元数据，保证文件存储格式完全一致，可以理解这种方式是物理一致。

ReplicatedMergeTree 也是在分布式集群中最常用的一种方案，但数据同步需要依赖 ZooKeeper，在一些 DDL 比较频繁的业务中 Zookeeper 往往会成为系统性能的瓶颈，甚至会导致服务不可用。

我们需要考虑为 ZooKeeper 减负，使用第一种方案 + 负载均衡轮询的方式可以降低单节点写入的压力。

总结

internal_replication = false

使用 Distributed + MergeTree 可实现逻辑一致分布式。

数据内容完全一致，数据存储格式不完全一致，数据同步不依赖 ZooKeeper，副本的数据可能会不一致，单点写入压力较大。

internal_replication = true

使用 Distributed + ReplicateMergeTree 可实现物理一致分布式。

数据内容完全一致，数据存储格式完全一致。数据同步需要依赖 ZooKeeper，ZooKeeper 会成为系统瓶颈。

关于 RadonDB

RadonDB 开源社区 是一个面向云原生、容器化的数据库开源社区。为数据库技术爱好者提供围绕主流开源数据库（MySQL、PostgreSQL、Redis、MongoDB、ClickHouse 等）的技术分享平台，并提供企业级 RadonDB 开源产品及服务。

目前 RadonDB 开源数据库系列产品已被光大银行、浦发硅谷银行、哈密银行、泰康保险、太平保险、安盛保险、阳光保险、百年人寿、安吉物流、安畅物流、蓝月亮、天财商龙、罗克佳华、升哲科技、无锡汇跑体育、北京电信、江苏交通控股、四川航空、昆明航空、国控生物等上千家企业及社区用户采用。

RadonDB 可基于云平台与 Kubernetes 容器平台交付，不仅提供覆盖多场景的数据库产品解决方案，而且提供专业的集群管理和自动化运维能力，主要功能特性包括：高可用主从切换、数据强一致性、读写分离、一键安装部署、多维指标监控 &告警、弹性扩容 &缩容、横向自由扩展、自动备份 &恢复、同城多活、异地灾备等。RadonDB 仅需企业及社区用户专注于业务层逻辑开发，无需关注集群高可用选型、管理和运维等复杂问题，帮助企业及社区用户大幅度提升业务开发与价值创新的效率！

GitHub：https://github.com/radondb

微信群： 请搜索添加群助手微信号 radondb

发布于: 9 小时前阅读数: 8

原文链接:【http://xie.infoq.cn/article/a891246f97e7c2f2efc53455c】。文章转载请联系作者。

RadonDB开源社区

关注

https://radondb.com 2021.06.21 加入

一个面向云原生、容器化的数据库开源社区！

发布

暂无评论

创作场景

设计 | ClickHouse 分布式表实现数据同步

| Distributed + MergeTree

1. 集群配置

2. 数据写入

3. 数据查询

| Distributed + ReplicateMergeTree

1. 配置文件

2. 数据写入

3. 数据查询

总结

关于 RadonDB

RadonDB开源社区

评论