隐语纵向联邦 SecureBoost Benchmark 白皮书

2023-08-25
浙江
本文字数：4609 字
阅读完需：约 15 分钟

“隐语” 是开源的可信隐私计算框架，内置 MPC、TEE、同态等多种密态计算虚拟设备供灵活选择，提供丰富的联邦学习算法和差分隐私机制。

代码开源：

导语：

在数据科学竞赛中经典算法 XGB 备受关注。但有小伙伴担心，在纵向联邦中 XGB 是否足够高效，安全和效率是否可以兼得，隐私计算是否耗时太长导致模型迭代缓慢？使用隐语中联邦算法 SecureBoost 的高效实现, 炼丹效率轻松狂飙 10 倍！

隐语近期开源了基于纵向联邦算法 SecureBoost 算法，并进行了高性能实现。与秘密分享方案的 SS-XGB 相比，SecureBoost 性能具有更好的表现，不过由于是非 MPC 算法，在安全方面低于 SS-XGB。

隐语 SecureBoost（下文简称：隐语 SGB）利用了安全底座和多方联合计算的分布式架构, 极大提高了密态计算效率和灵活性。只需要通过简单配置, 隐语 SGB 即可切换同态加密协议, 例如 Paillier 和 OU, 满足不同场景下的安全和计算效率的需求。

本文将介绍隐语 SGB 的具体测试环境、步骤和数据, 方便您了解协议的使用方法和性能数据, 从而更好地了解隐语 SGB, 满足您的业务需求。让我们一起来领略隐语 SGB 的魅力吧！

测试方法和步骤：

一、测试机型

●Python：3.8

●pip: >= 19.3

●OS: CentOS 7

●CPU/Memory: 推荐最低配置是 8C16G

●硬盘：500G

二、安装 conda

使用 conda 管理 python 环境，如果机器没有 conda 需要先安装。

#sudo apt-get install wgetwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
#安装bash Miniconda3-latest-Linux-x86_64.sh
# 一直按回车然后输入yesplease answer 'yes' or 'no':>>> yes
# 选择安装路径, 文件名前加点号表示隐藏文件Miniconda3 will now be installed into this location:>>> ~/.miniconda3
# 添加配置信息到 ~/.bashrc文件Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no][no] >>> yes
#运行配置信息文件或重启电脑source ~/.bashrc
#测试是否安装成功，有显示版本号表示安装成功conda --version

复制代码

三、安装 secretflow

conda create -n sf-benchmark python=3.8
conda activate sf-benchmark
pip install -U secretflow

复制代码

四、数据要求

两方数据规模：

alice 方：100 万 50 维
bob 方：100 万 50 维

三方数据规模：

alice 方：100 万 34 维
bob 方：100 万 33 维
carol：100 万 33 维

五、Benchmark 脚本

import loggingimport socketimport sysimport time
import spufrom sklearn.metrics import mean_squared_error, roc_auc_score
import secretflow as sffrom secretflow.data import FedNdarray, PartitionWayfrom secretflow.device.driver import reveal, waitfrom secretflow.ml.boost.sgb_v import Sgbfrom secretflow.utils.simulation.datasets import create_dffrom secretflow.data.vertical import read_csv as v_read_csv

# init loglogging.basicConfig(stream=sys.stdout, level=logging.INFO)logging.info("test")
_parties = {    # you may change the addresses    # 将alice、bob、carol的ip替换为实际ip    'alice': {'address': '192.168.0.1:23041'},    'bob': {'address': '192.168.0.2:23042'},    'carol': {'address': '192.168.0.3:23043'},
}

def setup_sf(party, alice_ip, bob_ip, carol_ip):
    cluster_conf = {        'parties': _parties,        'self_party': party,    }
    # init cluster    _system_config = {'lineage_pinning_enabled': False}    sf.init(        address='local',        num_cpus=8,        log_to_driver=True,        cluster_config=cluster_conf,        exit_on_failure_cross_silo_sending=True,        _system_config=_system_config,        _memory=5 * 1024 * 1024 * 1024,        cross_silo_messages_max_size_in_bytes = 2 * 1024 * 1024 * 1024 -1,        object_store_memory=5 * 1024 * 1024 * 1024,    )    # SPU settings    cluster_def = {        'nodes': [            {'party': 'alice', 'id': 'local:0', 'address': alice_ip},            {'party': 'bob', 'id': 'local:1', 'address': bob_ip},            {'party': 'carol', 'id': 'local:1', 'address': carol_ip},        ],        'runtime_config': {            # SEMI2K support 2/3 PC, ABY3 only support 3PC, CHEETAH only support 2PC.            # pls pay attention to size of nodes above. nodes size need match to PC setting.            'protocol': spu.spu_pb2.ABY3,            'field': spu.spu_pb2.FM64,        },    }
    # HEU settings    heu_config = {        'sk_keeper': {'party': 'alice'},        'evaluators': [{'party': 'bob'},{'party': 'carol'}],        'mode': 'PHEU',  # 这里修改同态加密相关配置        'he_parameters': {            'schema': 'paillier',            'key_pair': {                'generate': {                    'bit_size': 2048,                },            },        },        'encoding': {            'cleartext_type': 'DT_I32',            'encoder': "IntegerEncoder",            'encoder_args': {"scale": 1},        },    }    return cluster_def, heu_config

class SGB_benchmark:    def __init__(self, cluster_def, heu_config):        self.alice = sf.PYU('alice')        self.bob = sf.PYU('bob')        self.carol = sf.PYU('carol')        self.heu = sf.HEU(heu_config, cluster_def['runtime_config']['field'])
    def run_sgb(self, test_name, v_data, label_data, y, logistic, subsample, colsample):        sgb = Sgb(self.heu)        start = time.time()        params = {            'num_boost_round': 5,            'max_depth': 5,            'sketch_eps': 0.08,            'objective': 'logistic' if logistic else 'linear',        'reg_lambda': 0.3,        'subsample': subsample,        'colsample_by_tree': colsample,        }        model = sgb.train(params, v_data, label_data)    #    reveal(model.weights[-1])        print(f"{test_name} train time: {time.time() - start}")        start = time.time()        yhat = model.predict(v_data)        yhat = reveal(yhat)        print(f"{test_name} predict time: {time.time() - start}")        if logistic:        print(f"{test_name} auc: {roc_auc_score(y, yhat)}")else:        print(f"{test_name} mse: {mean_squared_error(y, yhat)}")
        fed_yhat = model.predict(v_data, self.alice)        assert len(fed_yhat.partitions) == 1 and self.alice in fed_yhat.partitions        yhat = reveal(fed_yhat.partitions[self.alice])        assert yhat.shape[0] == y.shape[0], f"{yhat.shape} == {y.shape}"        if logistic:        print(f"{test_name} auc: {roc_auc_score(y, yhat)}")else:        print(f"{test_name} mse: {mean_squared_error(y, yhat)}")
        def test_on_linear(self, sample_num, total_num):        """        sample_num: int. this number * 10000 = sample number in dataset.        """        io_start = time.perf_counter()        common_path = "/root/sf-benchmark/data/{}w_{}d_3pc/independent_linear.".format(        sample_num, total_num        )        vdf = v_read_csv(        {self.alice: common_path + "1.csv", self.bob: common_path + "2.csv", self.carol: common_path + "3.csv"},        keys='id',        drop_keys='id',        )        # split y out of dataset,        # <<< !!! >>> change 'y' if label column name is not y in dataset.        label_data = vdf["y"]        # v_data remains all features.        v_data = vdf.drop(columns="y")        # <<< !!! >>> change bob if y not belong to bob.        y = reveal(label_data.partitions[self.alice].data)        wait([p.data for p in v_data.partitions.values()])        io_end = time.perf_counter()        print("io takes time", io_end - io_start)        self.run_sgb("independent_linear", v_data, label_data, y, True, 1, 1)

        def run_test(party):        cluster_def, heu_config = setup_sf(party, _parties['alice'], _parties['bob'], _parties['carol'])        test_suite = SGB_benchmark(cluster_def, heu_config)        test_suite.test_on_linear(100, 100)
        sf.shutdown()

        if __name__ == '__main__':        import argparse
        parser = argparse.ArgumentParser(prog='sgb benchmark remote')        parser.add_argument('party')        args = parser.parse_args()        run_test(args.party)

复制代码

将脚本下载到测试机上，可命名为 sgb_benchmark.py，alice、bob、carol 三方共用 1 个脚本。

2 方 SGB 启动方式如下：

alice 方：python sgb_benchmark.py alicebob 方：python sgb_benchmark.py bob

3 方 SGB 启动方式如下：

alice 方：python sgb_benchmark.py alicebob 方：python sgb_benchmark.py bobcarol 方：python sgb_benchmark.py carol

SGB Benchmark 报告

解读：

本次 benchmark 的数据为百万百维。我们在两组网络参数下进行实验。算法参数中的 schema 也有'paillier'和'ou'两种。本次实验训练的 XGB 树的数量为 5，深度为 5，特征分桶数量为 13，进行二分类任务。我们分别在两方和三方场景下进行上述实验。两方情况下，alice 和 bob 各拥有其中 50 维的数据。三方情况下，alice， bob 和 carol 分别拥有（34，33，33）维数据。

整体来讲三方计算效率更高，体现了多方之间并行计算的优势。

LAN 的实验模拟本地局域网的环境下的性能和 WAN 的实验模拟在低延迟互联网环境下的性能。对于同态加密方案来说，计算应该是瓶颈，计算耗时对于网络延迟的敏感性比秘密分享方案要低得多，在 LAN 模式和 WAN 模式下计算耗时相差并不巨大。

在设置 HEU 所用协议时，我们分别配置了 paillier 和 ou 两种协议计算作为对比（密钥长度默认为 2048bit）。Paillier 和 OU 均为 IND-CPA 安全，语义安全（Semantic Security）的加密系统，但是基于不同的困难假设。在加密性能和密态加法的性能上 OU 要优于 Paillier，密文大小也是 Paillier 的一半，关于 OU 更详细的介绍参见下方链接。总体来讲，OU 相比于 Paillier 在隐语 SGB 上提供了 3～4 倍的计算性能加速并把内存需求降低一半。

参考资料：

Okamoto-Uchiyama 算法介绍

https://www.secretflow.org.cn/docs/heu/zh_CN/getting_started/algo_choice.html#okamoto-uchiyama

隐语社区：

https://github.com/secretflow

https://gitee.com/secretflow

隐语官网：https://www.secretflow.org.cn

👇欢迎关注：

公众号：隐语的小剧场

B 站：隐语 secretflow

邮箱：secretflow-contact@service.alipay.com

发布于: 刚刚阅读数: 3