The Availability and Performance analytics of Sina Weibo comment

作者：David

2022-11-13
澳大利亚
本文字数：2570 字
阅读完需：约 8 分钟

Background

Sina Weibo is a Chinese microblogging website. (Like the Chinese version of Twitter.) Based on the Weibo Annual Report, Weibo has 225 million average DAUs (Daily Active Users).

Requirement

Users should be able to comment on any Weibo post or comment.

Capacity Estimation and Constraints

Post Weibo

Let's assume, on average, each user makes 1 post per day. (Only text) then the total post we have is around 250 million.

Most people use Weibo at 8:00~9:00 in the morning, 12:00~13:00 at noon, and 20:00~22:00 in the evening, assuming that these periods account for the total amount of Weibo is 60%, then the average TPS of posts in these 4 hours is calculated as follows:

250 million * 60% / (4 * 3600) ≈ 10 K/s。

Read Weibo

Let's assume on average, each post will have 100 views. Then the total number of views is 25 billion. The QPS of read Weibo is

25 billion * 60% / (4 * 3600) ≈ 1000 K/s。

Comment Weibo

Assume each post will have 5 comments on average. Then the total number of comments is 5 billion. The TPS of comment Weibo is 1.25 billion. Same as Post Weibo, the TPS is:

1.25 billion * 60% / (4 * 3600) ≈ 50 K/s。

Read Comment

Assuming the comment have the same views as post, we would have 25 billion reads per day. The QPS of reading comment is

25 billion * 60% / (4 * 3600) ≈ 1000 K/s。

Post Weibo

Analytics

Post Weibo is a write operation, we can use load balancing.

With such volume, we will use Multi-level load-balancing architecture, covering DNS -> F5 -> Nginx -> Gateway.

Design

1. The load-balancing algorithm

Only the login user can make a post. And the login status is generally stored in the distributed cache. Therefore, when posting Weibo, the request can be sent to any server. Here, we can use "Polling" or a "random" algorithm.

2. Number of servers

Posting Weibo involves several key processes: content auditing, writing data to storage (depending on the storage system), and writing data to the cache (depending on the cache system). Therefore, it is estimated that a service processes 500 per second, and the completion of 10K/s TPS requires 20 servers, plus some buffers, 25 servers are almost enough.

Read Weibo

Analytics

Read Weibo is a heavy read operation, and since a post is not editable after being posted, topical use case for caching. Also with such a volume of requests (25 billion), we need Multi-level load-balancing architecture.

Design

1. The load-balancing algorithm

Anyone can view Weibo, even without login, so the request can be sent to any server. We can use "Polling" or a "random" algorithm.

2. Number of servers

Assuming that the CDN will handle 90% of user traffic, then the remaining 10% of requests for reading Weibo will be direct to the system, and the request QPS is 1000K/s * 10% = 100K/s. Since the logic of reading Weibo is relatively simple, It is mainly a read cache system. Therefore, assuming that the processing capacity of a single business server is 1000/s, the number of machines is 100. According to the 20% reservation, the final number of machines is 120.

Multi-level load-balancing architecture

The caching architecture

Post Comment

Similar to posting Weibo, it's a heavy write operation. We will have the same architecture. (Multi-level load-balancing architecture)

We will need 100 servers + 20% buffer (20 servers) = 120 servers.

Design

While the comment feature is not required a very strict time SLA, we can use an asynchronous process to get better scalable and efficient.

We will push all the comment events into a queue. And will have workers consume those events (jobs) and update the cache and Database as an async process.

Read Comment

Similar to reading Weibo, it's a heavy read operation. We will have the same architecture. (Caching + Multi-level load-balancing architecture)

With the same QPS, we need a same number of servers, 120.

The Webo Architecture

HOT event

When there is a hot (incident) going on, there are some actions we can take to protect our server.

Service downgrade

What is service downgrade?

When the server pressure increases sharply, according to the actual business usage and traffic, some services and pages are strategically not processed or processed in a simple way, by releasing the resources of server resources to ensure the normal and efficient operation of the core business.

In our case, the core business is: posting weibo and reading weibo.

Non-core business is: posting comments and viewing comments.

So if there is a service downgrade situation, we can downgrade the posting comments (temporarily stop supporting posting comments or have an async job on the client and retry later) and view comments (temporarily not showing any comments).

Service fuse

The function of the service fuse is similar to the fuse in our home. When a service is unavailable or the response times out, to prevent the entire system from avalanching, the call to the service is temporarily stopped.

Rate Limit (Flow control)

If our server still can't handle the volume after the service downgrade (only the core business is available), we can use the rate limit to protect our server by simply controlling the request volume so at least we can still provide service for some users.

发布于: 刚刚阅读数: 4

原文链接:【http://xie.infoq.cn/article/a36b27c155930f10f5098c8fc】。文章转载请联系作者。

David

关注

还未添加个人签名 2018-03-18 加入

还未添加个人简介

发布

暂无评论

创作场景

The Availability and Performance analytics of Sina Weibo comment

Background

Requirement

Capacity Estimation and Constraints

Post Weibo

Read Weibo

Comment Weibo

Read Comment

Post Weibo

Analytics

Design

1. The load-balancing algorithm

2. Number of servers

Read Weibo

Analytics

Design

1. The load-balancing algorithm

2. Number of servers

Multi-level load-balancing architecture

The caching architecture

Post Comment

Design

Read Comment

The Webo Architecture

HOT event

Service downgrade

What is service downgrade?

Service fuse

Rate Limit (Flow control)

David

评论