架構師訓練營 week6 總結
Distributed DBMS
Distributed databases
1 master, multiple slaves
Distributed loading
Master/master duplicate
Can’t concurrent write
Data sharding
By coding
Mapping table in outside storage
Middleware
Challenges
Need extra codes
Can’t use SQL join
Can’t use transaction
Need more servers
Middleware

Cluster scaling

Deployment
1 service 1 database

Master / slave

2 services 2 databases

Complex

CAP theorem
Consistency
Availability
Partition tolerance
Data non-consistent
Eventual consistency
Eventual consistency conflicts on writing
Decided by timestamp and overwrite

Decided by client-side

Voting - (cassandra)

Cassandra voting structure

ACID
Atomicity
Isolation
Durability
Consistency
BASE
Basically Available
Soft state
Allow latency
Eventually consistent
ZooKeeper
Split-brain
Different servers get conflicts command. Cluster/data chaos
Paxos - distributed consensus algorithm



Cluster management and Failover

Search Engine

Crawler system

Robots exclusion protocol
robots.txt
Inverted index

Lucene structure

Lucene reverted index

Lucene
If data is big, rebuilding index takes time, so Lucene introduce “Segment"
Separate to Segment - every segment is independent
Need to merge segments regularly
ElasticSearch

How to dispatch

Assistant robot sample
https://github.com/zhihuili/robo


Doris分析案例
Doris Architecture

Doris storage

Data partition

Visit structure
2 write to promise availability (2W, 1R)
Partition algo to find nodes
Data recovery and data sync
Redo log
Update log

Cluster - healthy check

Failover


Scalable and data migration


Logical storage structure

Doris consistent hash
评论