ARM 和 X86 云服务器的算力对比
背景
目前国内信创(信息技术应用创新产业)趋势发展正猛,借此机会,众多国内服务器,芯片厂商都推出了国产服务器和国产芯片。同时各大云计算厂商也推出了信创云(服务器),但是针对 ARM 和 X86 两种架构的 CPU 算力,很多人都存在疑问,今天我们就一起来对某主流云厂商的 ARM 和 X86 架构云服务器的 CPU 算力进行测试。
工具安装
sysbench
用于测试 CPU 整型算力。
# 安装依赖
yum install automake libtool gcc -y
# 下载sysbench源码包
wget https://github.com/akopytov/sysbench/archive/1.0.20.tar.gz -O sysbench-1.0.20.tar.gz
# 解压
tar -xvf sysbench-1.0.20.tar.gz
# 执行autogen.sh
cd sysbench-1.0.20
sh autogen.sh
# 生成Makefile
./configure --without-mysql
# 编译并安装
make -j8 && make install
# 查看安装结果(版本信息)
sysbench --version
Unixbench
用于测试 CPU 浮点数算力。
# 下载
wget http://soft.vpser.net/test/unixbench/unixbench-5.1.2.tar.gz
# 解压
tar zxvf unixbench-5.1.2.tar.gz
# 配置
如果不需要进行图形测试或者不在图形化界面下测试,则将Makefile文件中GRAPHICS_TEST = defined注释掉
make
# 安装依赖
yum install -y perl
# 执行测试
cd unixbench-5.1.2
./Run
执行测试-整型
被测机型规格
被测X86和ARM云服务器规格都为:8C32G,云盘2T。
被测机型 CPU 型号
被测X86云服务器CPU型号:Intel(R) Xeon(R) Silver 4114 CPU @2.20GHz
被测ARM云服务器CPU型号:Phytium FT-2000+/64 @2.2GHz
X86
测试 8 线程,20000 内的质数计算能力。Score:2813.42
[root@X86-Performance ~]# sysbench cpu --cpu-max-prime=20000 --threads=8 --time=60 run
sysbench 1.0.17 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 8
Initializing random number generator from current time
Prime numbers limit: 20000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 2813.42
General statistics:
total time: 60.0025s
total number of events: 168818
Latency (ms):
min: 2.82
avg: 2.84
max: 17.52
95th percentile: 2.86
sum: 479885.99
Threads fairness:
events (avg/stddev): 21102.2500/13.03
execution time (avg/stddev): 59.9857/0.01
[root@X86-Performance ~]#
ARM
测试 8 线程,20000 内的质数计算能力。Score:7077.50
[root@performance-arm ~]# sysbench cpu --cpu-max-prime=20000 --threads=8 --time=60 run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 8
Initializing random number generator from current time
Prime numbers limit: 20000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 7077.50
General statistics:
total time: 60.0024s
total number of events: 424684
Latency (ms):
min: 1.12
avg: 1.13
max: 18.34
95th percentile: 1.14
sum: 479797.10
Threads fairness:
events (avg/stddev): 53085.5000/32.63
execution time (avg/stddev): 59.9746/0.00
[root@performance-arm ~]#
结果分析
根据测试结果可以得出 ARM 结构的云服务器算力比 X86 的算力在整型计算能力上高出 2 倍多。
执行测试-浮点数
X86
使用 Unixbench 分别测试单线程和 8 线程 CPU 在 Double-Precision Whetstone 项目中的得分。
1 线程:3946.1 MWIPS
8 线程:31546.4 MWIPS
------------------------------------------------------------------------
Benchmark Run: Wed May 19 2021 19:24:55 - 19:53:02
8 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 33293015.3 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3946.1 MWIPS (9.8 s, 7 samples)
Execl Throughput 984.1 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 466370.4 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 119865.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1466024.1 KBps (30.0 s, 2 samples)
Pipe Throughput 583004.5 lps (10.0 s, 7 samples)
Pipe-based Context Switching 129953.0 lps (10.0 s, 7 samples)
Process Creation 3494.1 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 2352.7 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 2701.8 lpm (60.0 s, 2 samples)
System Call Overhead 495048.1 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 33293015.3 2852.9
Double-Precision Whetstone 55.0 3946.1 717.5
Execl Throughput 43.0 984.1 228.9
File Copy 1024 bufsize 2000 maxblocks 3960.0 466370.4 1177.7
File Copy 256 bufsize 500 maxblocks 1655.0 119865.0 724.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 1466024.1 2527.6
Pipe Throughput 12440.0 583004.5 468.7
Pipe-based Context Switching 4000.0 129953.0 324.9
Process Creation 126.0 3494.1 277.3
Shell Scripts (1 concurrent) 42.4 2352.7 554.9
Shell Scripts (8 concurrent) 6.0 2701.8 4502.9
System Call Overhead 15000.0 495048.1 330.0
========
System Benchmarks Index Score 756.6
------------------------------------------------------------------------
Benchmark Run: Wed May 19 2021 19:53:02 - 20:21:10
8 CPUs in system; running 8 parallel copies of tests
Dhrystone 2 using register variables 265277164.6 lps (10.0 s, 7 samples)
Double-Precision Whetstone 31546.4 MWIPS (9.8 s, 7 samples)
Execl Throughput 20901.0 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 871968.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 234891.6 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 2799968.7 KBps (30.0 s, 2 samples)
Pipe Throughput 4642141.4 lps (10.0 s, 7 samples)
Pipe-based Context Switching 1059963.5 lps (10.0 s, 7 samples)
Process Creation 55490.3 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 33809.9 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 4641.0 lpm (60.1 s, 2 samples)
System Call Overhead 3522148.0 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 265277164.6 22731.5
Double-Precision Whetstone 55.0 31546.4 5735.7
Execl Throughput 43.0 20901.0 4860.7
File Copy 1024 bufsize 2000 maxblocks 3960.0 871968.8 2201.9
File Copy 256 bufsize 500 maxblocks 1655.0 234891.6 1419.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 2799968.7 4827.5
Pipe Throughput 12440.0 4642141.4 3731.6
Pipe-based Context Switching 4000.0 1059963.5 2649.9
Process Creation 126.0 55490.3 4404.0
Shell Scripts (1 concurrent) 42.4 33809.9 7974.0
Shell Scripts (8 concurrent) 6.0 4641.0 7735.0
System Call Overhead 15000.0 3522148.0 2348.1
========
System Benchmarks Index Score 4450.0
[root@X86-Performance unixbench-5.1.2]#
ARM
使用 Unixbench 分别测试单线程和 8 线程 CPU 在 Double-Precision Whetstone 项目中的得分。
1 线程:3626.3 MWIPS
8 线程:28926.4 MWIPS
------------------------------------------------------------------------
Benchmark Run: Wed May 19 2021 18:59:02 - 19:27:08
8 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 22270696.0 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3626.3 MWIPS (9.3 s, 7 samples)
Execl Throughput 2591.5 lps (29.7 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 402971.9 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 121834.3 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1069823.2 KBps (30.0 s, 2 samples)
Pipe Throughput 730925.1 lps (10.0 s, 7 samples)
Pipe-based Context Switching 101991.7 lps (10.0 s, 7 samples)
Process Creation 5187.1 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 3884.2 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1588.8 lpm (60.0 s, 2 samples)
System Call Overhead 514939.2 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 22270696.0 1908.4
Double-Precision Whetstone 55.0 3626.3 659.3
Execl Throughput 43.0 2591.5 602.7
File Copy 1024 bufsize 2000 maxblocks 3960.0 402971.9 1017.6
File Copy 256 bufsize 500 maxblocks 1655.0 121834.3 736.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 1069823.2 1844.5
Pipe Throughput 12440.0 730925.1 587.6
Pipe-based Context Switching 4000.0 101991.7 255.0
Process Creation 126.0 5187.1 411.7
Shell Scripts (1 concurrent) 42.4 3884.2 916.1
Shell Scripts (8 concurrent) 6.0 1588.8 2648.0
System Call Overhead 15000.0 514939.2 343.3
========
System Benchmarks Index Score 783.9
------------------------------------------------------------------------
Benchmark Run: Wed May 19 2021 19:27:08 - 19:55:15
8 CPUs in system; running 8 parallel copies of tests
Dhrystone 2 using register variables 177048367.5 lps (10.0 s, 7 samples)
Double-Precision Whetstone 28926.4 MWIPS (9.3 s, 7 samples)
Execl Throughput 15952.7 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 598099.6 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 160373.3 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1793541.5 KBps (30.0 s, 2 samples)
Pipe Throughput 5840652.5 lps (10.0 s, 7 samples)
Pipe-based Context Switching 904721.9 lps (10.0 s, 7 samples)
Process Creation 16460.6 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 15821.5 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 2313.4 lpm (60.1 s, 2 samples)
System Call Overhead 1259178.2 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 177048367.5 15171.2
Double-Precision Whetstone 55.0 28926.4 5259.3
Execl Throughput 43.0 15952.7 3709.9
File Copy 1024 bufsize 2000 maxblocks 3960.0 598099.6 1510.4
File Copy 256 bufsize 500 maxblocks 1655.0 160373.3 969.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 1793541.5 3092.3
Pipe Throughput 12440.0 5840652.5 4695.1
Pipe-based Context Switching 4000.0 904721.9 2261.8
Process Creation 126.0 16460.6 1306.4
Shell Scripts (1 concurrent) 42.4 15821.5 3731.5
Shell Scripts (8 concurrent) 6.0 2313.4 3855.7
System Call Overhead 15000.0 1259178.2 839.5
========
System Benchmarks Index Score 2792.1
[root@performance-arm UnixBench]#
结果分析
根据测试结果得出,在浮点数计算中,ARM 架构的 CPU 算力约为 X86 的 92%,表现还是不错的。
Tips
为什么 ARM 的整型算力比 X86 高?
因为 ARM 和 X86 的指令集架构不同,ARM 天生在简单指令处理中就比 X86 快,所以在整型计算中才能大幅领先。
ARM 和 X86 的指令集有什么区别?
针对这个问题,我相信很多人和小编一样一时无法搞清楚,但是我们都知道 Intel 采用 CISC(复杂指令集),而 ARM 采用 RISC(简单指令集)。
对于拉屎这个动作,CISC 和 RISC 会向人发送不同的指令。RISC 的指令为:去拉屎吧!而 CISC 的指令为:起身,走到厕所,座上马桶,脱下裤子,开始拉屎!
ARM 和 X86 版本的软件一样吗?
arm 和 x86 架构的软件会有所不同,你可以在线或者离线下载,或者从厂家 support 处获取。
以上就是今天的全部内容了,感谢您的阅读,我们下节再会。
版权声明: 本文为 InfoQ 作者【Python测试开发】的原创文章。
原文链接:【http://xie.infoq.cn/article/6099d13cc975697a920300e89】。文章转载请联系作者。
Python测试开发
公众号:Python测试和开发 2018.10.14 加入
混迹于云原生领域的测试开发。
评论 (2 条评论)