架構師訓練營 week6 總結

关注

发布于: 2020 年 11 月 01 日

Distributed DBMSDistributed databases1 master, multiple slaves
Distributed loading
Master/master duplicate
Can’t concurrent write 
﻿
Data shardingBy coding
Mapping table in outside storage
Middleware
﻿
Challenges
Need extra codes
Can’t use SQL join
Can’t use transaction
Need more servers
﻿
Middleware
﻿
Cluster scaling
﻿
Deployment1 service 1 database
﻿
Master / slave
2 services 2 databases
Complex
CAP theoremConsistency
Availability
Partition tolerance
﻿
Data non-consistent
Eventual consistency
Eventual consistency conflicts on writingDecided by timestamp and overwrite
Decided by client-side
Voting - (cassandra)
﻿
Cassandra voting structure
﻿
﻿
ACID
Atomicity
Isolation
Durability
Consistency
﻿
BASE
Basically Available
Soft state
Allow latency
Eventually consistent
﻿
ZooKeeperSplit-brain
Different servers get conflicts command. Cluster/data chaos 
﻿
Paxos - distributed consensus algorithm﻿
﻿
﻿
Cluster management and Failover
Search Engine
Crawler system
Robots exclusion protocol
robots.txt
Inverted index﻿
﻿
Lucene structure
﻿
Lucene reverted index
﻿
Lucene
If data is big, rebuilding index takes time, so Lucene introduce “Segment"
Separate to Segment - every segment is independent
Need to merge segments regularly
ElasticSearch
﻿
How to dispatch
﻿
Assistant robot samplehttps://github.com/zhihuili/robo
﻿
﻿
Doris分析案例﻿
Product Goals
* Features
    * Data structure
        * KV engine
        * Logic storage structure - Namespace
    * Data visit
        * KV API
        * KV Client, abstract API, dispatch framework
        * High performance communicate 
* Non-features
    * Mass data
        * Transparent cluster management, storage replacement
    * scalability
        * Linear expansion, Smooth expansion
        * Partition, better routing algorithm
    * availability
        * Automatic fault tolerance and failover
        * Transparent cluster management, config management
    * performance
        * High concurrence, low latency
    * Feature expandability
        * Easy to add new features
    * Low maintain cost
        * Easy to management
        * Easy to monitor
* Eventually consistency
* Key tech points
    * failover
    * Scalable and data migration
    * Logical storage structure
        * Namespace to separate business logics
Doris Architecture
Doris storage 
Data partition
Visit structure2 write to promise availability (2W, 1R)
Partition algo to find nodes
Data recovery and data sync
Redo log
Update log
﻿
Cluster - healthy check
Failover
﻿
Scalable and data migration
﻿
Logical storage structure
﻿
Doris consistent hash https://github.com/itisaid/Doris/tree/master/common/doris.common/doris.algorithm/src/main/java/com/alibaba/doris/algorithm/vpm
﻿