写点什么

【我和 openGauss 的故事】openGauss 3.1.1 企业版主备集群升级至 5.0.0 操作指南

作者:daydayup
  • 2023-08-12
    北京
  • 本文字数:24410 字

    阅读完需:约 80 分钟

尚雷 [openGauss](javascript:void(0);) 2023-07-29 17:58 发表于四川


收录于合集 #第六届 openGauss 技术文章征集初审合格文章 62 个


前言:继前几日测试部署 openGauss 5.0 并写了[*Centos/RHEL 7 安装部署openGauss 5.0 企业版 一主二备一级联操作指南]*的文章,近日测试了 openGauss 从 3.1.1 升级 5.0.0,在升级过程中也遇到了一些问题。也非常希望看到此文的朋友,如果你在参照此文升级过程中遇到什么问题或者对此文有什么异议的地方,也希望能和我交流,不胜感激。

一、环境概要

本套数据库环境为 openGauss 3.1.1 企业版一主一备环境,前期安装部署 openGauss 3.1.1 前已参照 openGauss 官网安装了依赖包、关闭了防火墙\SElinux、调整了内核参数等其它相关所要求的环境准备,数据库相关环境信息如下:


对 openGauss 3 企业版集群安装部署不熟悉的可参照我之前写的文章:[Centos 7 系统 openGauss 3.1.0 一主两备集群安装部署指南],文章链接:https://www.modb.pro/db/551221

1.1 主机名称

1.2 主机地址

1.3 端口号信息

1.4 用户及组信息

1.5 软件目录信息

1.6 XML 配置文件信息

<?xml version="1.0" encoding="UTF-8"?><ROOT>    <!-- openGauss整体信息 -->    <CLUSTER>    <!-- 数据库名称 -->        <PARAM name="clusterName" value="openGSDB" />    <!-- 数据库节点名称(hostname) -->        <PARAM name="nodeNames" value="opengauss-db1,opengauss-db2" />    <!-- 节点IP,与nodeNames一一对应 -->        <PARAM name="backIp1s" value="10.110.3.155,10.110.3.156"/>    <!-- 数据库安装目录-->        <PARAM name="gaussdbAppPath" value="/opt/gaussdb/install/app" />    <!-- 日志目录-->        <PARAM name="gaussdbLogPath" value="/var/log/omm" />    <!-- 临时文件目录-->        <PARAM name="tmpMppdbPath" value="/opt/gaussdb/tmp"/>    <!--数据库工具目录-->        <PARAM name="gaussdbToolPath" value="/opt/gaussdb/install/om" />    <!--数据库core文件目录-->        <PARAM name="corePath" value="/opt/gaussdb/corefile"/>    <!-- openGauss类型,此处示例为单机类型,"single-inst"表示单机一主多备部署形态-->        <PARAM name="clusterType" value="single-inst"/>    </CLUSTER>    <!-- 每台服务器上的节点部署信息 -->    <DEVICELIST>        <!-- opengauss-db1上的节点部署信息 -->        <DEVICE sn="1000001">        <!-- opengauss-db1的hostname -->            <PARAM name="name" value="opengauss-db1"/>        <!-- opengauss-db1所在的AZ及AZ优先级 -->            <PARAM name="azName" value="AZ1"/>            <PARAM name="azPriority" value="1"/>        <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->            <PARAM name="backIp1" value="10.110.3.155"/>            <PARAM name="sshIp1" value="10.110.3.155"/>                    <!--CM-->     <!--CM数据目录-->            <PARAM name="cmDir" value="/opt/gaussdb/install/cm" />            <PARAM name="cmsNum" value="1" />      <!--CM监听端口-->            <PARAM name="cmServerPortBase" value="15300" />            <PARAM name="cmServerlevel" value="1" />      <!--CM所有实例所在节点名及监听ip-->            <PARAM name="cmServerListenIp1" value="10.110.3.155,10.110.3.156" />            <PARAM name="cmServerRelation" value="opengauss-db1,opengauss-db2" />                  <!--dbnode-->        <PARAM name="dataNum" value="1"/>      <!--DBnode端口号-->        <PARAM name="dataPortBase" value="26000"/>      <!--DBnode主节点上数据目录,及备机数据目录-->        <PARAM name="dataNode1" value="/opt/gaussdb/install/data/dn,opengauss-db2,/opt/gaussdb/install/data/dn"/>      <!--DBnode节点上设定同步模式的节点数-->        <PARAM name="dataNode1_syncNum" value="0"/>        </DEVICE>
<!-- opengauss-db2上的节点部署信息,其中"name"的值配置为主机名称(hostname) --> <DEVICE sn="1000002"> <PARAM name="name" value="opengauss-db2"/> <PARAM name="azName" value="AZ1"/> <PARAM name="azPriority" value="1"/> <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP --> <PARAM name="backIp1" value="10.110.3.156"/> <PARAM name="sshIp1" value="10.110.3.156"/> <PARAM name="cmDir" value="/opt/gaussdb/install/cm" /> </DEVICE> </DEVICELIST></ROOT>
复制代码

二、准备工作

2.1 下载 5.0.0 软件安装包

2.1.1 下载安装包

使用注册账号登录 openGauss 官网https://www.opengauss.org/zh/download/下载页面,下载与操作系统匹配的 openGauss 5.0.0 软件安装包,选择 openGauss_5.0.0 企业版下载,并将下载的软件包上传至服务器/opt/software/openGauss 目录下。



注:如果服务器可联网,可通过 wget 方式下载软件安装包。可用鼠标右键点击,然后选择“复制链接”,如数据库服务器可连外网,可在服务器上通过 wget 获取 openGauss 5.0.0 企业版软件安装包。


# root用户执行【主节点】[root@opengauss-db1 ~]# cd /opt/software/openGauss[root@opengauss-db1 openGauss]# wget https://opengauss.obs.cn-south-1.myhuaweicloud.com/5.0.0/x86/openGauss-5.0.0-CentOS-64bit-all.tar.gz
复制代码

2.1.2 校验安装包

点击上图





,将复制的内容粘贴到文本文件,显示内容为:aa9fc724c5030f4cc79dad201675183029c8f36a07667028e681169a2f6482f5,然后将下载的文件通过 sha256sum 命令进行校验,以确保下载安装包完整性。


# root用户执行【主节点】[root@opengauss-db1 openGauss]# sha256sum openGauss-5.0.0-CentOS-64bit-all.tar.gz    aa9fc724c5030f4cc79dad201675183029c8f36a07667028e681169a2f6482f5  openGauss-5.0.0-CentOS-64bit-all.tar.gz-- 如校验的值和官网SHA256值相同,表明文件完整
复制代码

2.1.3 解压安装包

# root用户执行【主节点】[root@opengauss-db1 ~]# cd /opt/software/openGauss[root@opengauss-db1 openGauss]# tar -zxvf openGauss-5.0.0-CentOS-64bit-all.tar.gz [root@opengauss-db1 openGauss]# tar -zxvf openGauss-5.0.0-CentOS-64bit-om.tar.gz[root@xsky-node1 openGauss]# lltotal 261040drwxr-xr-x 14 root root       302 Mar 29 03:22 lib-rw-r--r--  1 root root 133071038 Mar 29 20:11 openGauss-5.0.0-CentOS-64bit-all.tar.gz-rw-r--r--  1 root root       105 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit-cm.sha256-rw-r--r--  1 root root  22356000 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit-cm.tar.gz-rw-r--r--  1 root root        65 Mar 29 03:22 openGauss-5.0.0-CentOS-64bit-om.sha256-rw-r--r--  1 root root  11963876 Mar 29 03:22 openGauss-5.0.0-CentOS-64bit-om.tar.gz-rw-r--r--  1 root root        65 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit.sha256-rw-r--r--  1 root root  99384569 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit.tar.bz2drwxr-xr-x 10 root root      4096 Mar 29 03:22 script-rw-------  1 root root        65 Mar 29 03:21 upgrade_sql.sha256-rw-------  1 root root    493211 Mar 29 03:21 upgrade_sql.tar.gz-rw-r--r--  1 root root        32 Mar 29 03:22 version.cfg
复制代码

2.2 检查健康状态

# root用户执行【任一节点】-- 执行 gs_checkos -i A 命令[root@opengauss-dbxxx ~]# /opt/software/openGauss/script/gs_checkos -i A --detailChecking items:    A1. [ OS version status ]                                   : Normal             [opengauss-db1]        centos_7.9.2009_64bit         A2. [ Kernel version status ]                               : Normal             The names about all kernel versions are same. The value is "3.10.0-1160.92.1.el7.x86_64".    A3. [ Unicode status ]                                      : Normal             The values of all unicode are same. The value is "LANG=en_US.UTF-8".    A4. [ Time zone status ]                                    : Normal             The informations about all timezones are same. The value is "+0800".    A5. [ Swap memory status ]                                  : Normal             The value about swap memory is correct.                A6. [ System control parameters status ]                    : Normal             All values about system control  parameters are correct.    A7. [ File system configuration status ]                    : Normal             Both soft nofile and hard nofile are correct.          A8. [ Disk configuration status ]                           : Normal             The value about XFS mount parameters is correct.       A9. [ Pre-read block size status ]                          : Normal             The value about Logical block size is correct.         A11.[ Network card configuration status ]                   : Normal             The configuration about network card is correct.       A12.[ Time consistency status ]                             : Normal             The ntpd service is started, local time is "2023-07-21 16:24:44".    A13.[ Firewall service status ]                             : Normal             The firewall service is stopped.                       A14.[ THP service status ]                                  : Normal             The THP service is stopped.                        Total numbers:13. Abnormal numbers:0. Warning numbers:0.-- 对非Normal值要进行调整
复制代码

2.3 检查磁盘空间

# root用户执行【所有节点】-- 通过 df -H 及 df -i 查看磁盘相应信息是否可用-- df -h 查看磁盘空间-- df -i 查看inode空闲数
复制代码

2.4 检查版本信息

-- omm 用户 【任一节点】-- 查询所有节点版本信息[root@opengauss-dbxxx ~]# su - ommLast login: Fri Jul 21 16:07:06 CST 2023 on pts/1[omm@opengauss-dbxxx ~]$ gs_ssh -c "gsql -V"Successfully execute command on all nodes.
Output:[SUCCESS] opengauss-db1:gsql (openGauss 3.1.1 build 70980198) compiled at 2023-01-06 09:34:59 commit 0 last mr [SUCCESS] opengauss-db2:gsql (openGauss 3.1.1 build 70980198) compiled at 2023-01-06 09:34:59 commit 0 last mr
复制代码


2.5 检查集群状态

-- omm 用户 【任一节点】[omm@opengauss-dbxxx ~]$ gs_om -t status --detail[  CMServer State   ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normalredistributing : Nobalanced : Yescurrent_az : AZ_ALL
[ Datanode State ]
node node_ip instance state ------------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Primary Normal2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Standby Normal
复制代码


2.6 备份数据库

物理备份数据库


-- omm 用户执行【主节点】[root@opengauss-db1 ~]# su - ommLast login: Fri Jul 21 16:51:53 CST 2023 on pts/1
-- 创建目录[omm@opengauss-db1 ~]$ BACKUP_DIR=/opt/gaussdb/backup/`date '+%Y%m%d_%H%M%S'`[omm@opengauss-db1 ~]$ mkdir -p $BACKUP_DIR
-- 执行物理备份[omm@opengauss-db1 backup]$ gs_basebackup -D $BACKUP_DIR -p 26000 -P -l $BACKUP_DIRINFO: The starting position of the xlog copy of the full build is: 0/400E8B0. The slot minimum LSN is: 0/400E8B0. The disaster slot minimum LSN is: 0/0. The logical slot minimum LSN is: 0/0.[2023-07-21 17:11:55]:begin build tablespace list[2023-07-21 17:11:55]:finish build tablespace list[2023-07-21 17:11:55]:begin get xlog by xlogstream check identify system successpace[2023-07-21 17:11:55]: [2023-07-21 17:11:55]: send START_REPLICATION 0/4000000 success [2023-07-21 17:11:55]: keepalive message is received [2023-07-21 17:11:55]: keepalive message is received 97981/97981 kB (100%), 1/1 tablespace[2023-07-21 17:12:00]:gs_basebackup: base backup successfully
-- 查看备份信息[omm@opengauss-db1 ~]$ ls -l /opt/gaussdb/backup/20230721_171855total 5084-rw------- 1 omm dbgrp 216 Jul 21 17:19 backup_label-rw------- 1 omm dbgrp 198 Jul 21 17:19 backup_label.olddrwx------ 5 omm dbgrp 4096 Jul 21 17:19 base-rw------- 1 omm dbgrp 0 Jul 21 17:19 build_completed.done-rw------- 1 omm dbgrp 4399 Jul 21 17:19 cacert.pemdrwx------ 4 omm dbgrp 4096 Jul 21 17:19 dbe_perf_standby-rw------- 1 omm dbgrp 56 Jul 21 17:19 full_backup_labeldrwx------ 2 omm dbgrp 4096 Jul 21 17:19 global-rw------- 1 omm dbgrp 4915200 Jul 21 17:19 gswlm_userinfo.cfg-rw------- 1 omm dbgrp 21016 Jul 21 17:19 mot.confdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_clogdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_csnlog-rw------- 1 omm dbgrp 0 Jul 21 17:19 pg_ctl.lockdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_errorinfo-rw------- 1 omm dbgrp 4676 Jul 21 17:19 pg_hba.conf-rw------- 1 omm dbgrp 4676 Jul 21 17:19 pg_hba.conf.bak-rw------- 1 omm dbgrp 1024 Jul 21 17:19 pg_hba.conf.lock-rw------- 1 omm dbgrp 1636 Jul 21 17:19 pg_ident.confdrwx------ 4 omm dbgrp 4096 Jul 21 17:19 pg_llogdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_logicaldrwx------ 4 omm dbgrp 4096 Jul 21 17:19 pg_multixactdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_notifydrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_replslotdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_serialdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_snapshotsdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_stat_tmpdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_tblspcdrwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_twophase-rw------- 1 omm dbgrp 4 Jul 21 17:19 PG_VERSIONdrwx------ 3 omm dbgrp 4096 Jul 21 17:19 pg_xlog-rw------- 1 omm dbgrp 35919 Jul 21 17:19 postgresql.conf-rw------- 1 omm dbgrp 35919 Jul 21 17:19 postgresql.conf.guc.bak-rw------- 1 omm dbgrp 1024 Jul 21 17:19 postgresql.conf.lock-rw------- 1 omm dbgrp 35919 Jul 21 17:19 postgresql.conf.wal.bak-rw------- 1 omm dbgrp 0 Jul 21 17:19 postmaster.pid.lock-rw------- 1 omm dbgrp 10 Jul 21 17:19 rewind_lable-rw------- 1 omm dbgrp 4402 Jul 21 17:19 server.crt-rw------- 1 omm dbgrp 1766 Jul 21 17:19 server.key-rw------- 1 omm dbgrp 56 Jul 21 17:19 server.key.cipher-rw------- 1 omm dbgrp 24 Jul 21 17:19 server.key.rand-rw------- 1 omm dbgrp 4 Jul 21 17:19 term_filedrwx------ 5 omm dbgrp 4096 Jul 21 17:19 undo
复制代码



2.7 停止集群

执行灰度升级,该步骤可不执行,此处停止集群,只为升级失败方便回退。


-- 停集群,omm 用户执行【主节点】[omm@opengauss-db1 ~]$ gs_om -t stopStopping cluster.=========================================Successfully stopped cluster.=========================================End stop cluster.[omm@opengauss-db1 ~]$ gs_om -t status --detail --all[  CMServer State   ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Down2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Down
cm_ctl: can't connect to cm_server.Maybe cm_server is not running, or timeout expired. Please try again.
复制代码


2.8 备份目录及文件

-- root 用户执行【所有节点】-- 升级前建议参照clusterconfig.xml文件对相应目录及文件进行备份,以防升级失败-- 本次测试环境数据库相应目录如下,请参照实际生产环境执行<PARAM name="gaussdbAppPath" value="/opt/gaussdb/install/app" /><PARAM name="gaussdbLogPath" value="/var/log/omm" /><PARAM name="tmpMppdbPath" value="/opt/gaussdb/tmp" /><PARAM name="gaussdbToolPath" value="/opt/gaussdb/install/om" /><PARAM name="corePath" value="/opt/gaussdb/corefile" /><PARAM name="dataNode1" value="/opt/gaussdb/install/data/dn,opengauss-db2,/opt/gaussdb/install/data/dn"/>
-- 备份目录[root@opengauss-dbxxx ~]# cd /opt[root@opengauss-dbxxx opt]# tar -czf gaussdb_3.1.1.tar ./gaussdb/
复制代码

2.9 启动集群

-- 停集群,omm 用户执行【主节点】[omm@opengauss-db1 ~]$ gs_om -t startStarting cluster.======================================================================Successfully started primary instance. Wait for standby instance.======================================================================.Successfully started cluster.======================================================================cluster_state      : Normalredistributing     : Nonode_count         : 2Datanode State    primary           : 1    standby           : 1    secondary         : 0    cascade_standby   : 0    building          : 0    abnormal          : 0    down              : 0
Successfully started cluster.[omm@opengauss-db1 ~]$ gs_om -t status --detail --all[ CMServer State ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normalredistributing : Nobalanced : Yescurrent_az : AZ_ALL
[ Datanode State ]
node node_ip instance state ------------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Primary Normal2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Standby Normal
复制代码


三、执行升级

本次采用灰度升级集群

3.1 升级前预检查

# root用户执行【主节点】[root@opengauss-db1 ~]# python3 /opt/software/openGauss/script/gs_preinstall -U omm -G dbgrp -X /opt/software/openGauss/cluster_config.xmlParsing the configuration file.Successfully parsed the configuration file.Installing the tools on the local node.Successfully installed the tools on the local node.Are you sure you want to create trust for root (yes/no)?yes  -- 输入 yesPlease enter password for rootPassword: Successfully created SSH trust for the root permission user.Setting host ip envSuccessfully set host ip env.Distributing package.Begin to distribute package to tool path.Successfully distribute package to tool path.Begin to distribute package to package path.Successfully distribute package to package path.Successfully distributed package.Are you sure you want to create the user[omm] and create trust for it (yes/no)? no  -- 输入noPreparing SSH service.Successfully prepared SSH service.Installing the tools in the cluster.Successfully installed the tools in the cluster.Checking hostname mapping.Successfully checked hostname mapping.Checking OS software.Successfully check os software.Checking OS version.Successfully checked OS version.Creating cluster's path.Successfully created cluster's path.Set and check OS parameter.Setting OS parameters.Successfully set OS parameters.Warning: Installation environment contains some warning messages.Please get more details by "/opt/software/openGauss/script/gs_checkos -i A -h opengauss-db1,opengauss-db2 --detail".Set and check OS parameter completed.Preparing CRON service.Successfully prepared CRON service.Setting user environmental variables.Successfully set user environmental variables.Setting the dynamic link library.Successfully set the dynamic link library.Setting Core fileSuccessfully set core path.Setting pssh pathSuccessfully set pssh path.Setting Cgroup.Successfully set Cgroup.Set ARM Optimization.No need to set ARM Optimization.Fixing server package owner.Setting finish flag.Successfully set finish flag.Preinstallation succeeded.
-- 可通过/opt/software/openGauss/script/gs_checkos -i A -h opengauss-db1,opengauss-db2 --detail查看预检查详细信息,如有告警等信息进行处理
复制代码

3.2 执行升级

# root用户执行【主节点】[root@opengauss-db1 ~]# chmod -R 755 /opt/software/openGauss/script/[root@opengauss-db1 ~]# chown -R omm:dbgrp /opt/software/openGauss/script/
-- 灰度升级[omm@opengauss-db1 ~]$ /opt/software/openGauss/script/gs_upgradectl -t auto-upgrade --grey -X /opt/software/openGauss/cluster_config.xmlStatic configuration matched with old static configuration files.Wait for the cluster status normal or degrade.Start check CMS parameter.Old cluster version number less than 92574.Successfully set upgrade_mode to 0.Checking upgrade environment.Successfully checked upgrade environment.Start to do health check.Successfully checked cluster status.Upgrade all nodes.NOTICE: The directory /opt/gaussdb/install/app_70980198 will be deleted after commit-upgrade, please make sure there is no personal data.Performing grey rollback.No need to rollback.The directory /opt/gaussdb/install/app_70980198 will be deleted after commit-upgrade, please make sure there is no personal data.Installing new binary.Wait for the cluster status normal or degrade.copy certs from /opt/gaussdb/install/app_70980198 to /opt/gaussdb/install/app_a07d57c3.Successfully copy certs from /opt/gaussdb/install/app_70980198 to /opt/gaussdb/install/app_a07d57c3.Successfully backup hotpatch config file.Sync cluster configuration.Successfully synced cluster configuration.Switch symbolic link to new binary directory.Successfully switch symbolic link to new binary directory.Start check CMS parameter.Old cluster version number less than 92574.Switching all db processes.Check cluster state.Cluster state: [ Cluster State ]
cluster_state : Normalredistributing : Nocurrent_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state-----------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 26000 6001 P Primary Normal2 opengauss-db2 10.110.3.156 26000 6002 S Standby NormalWait for the cluster status normal or degrade.Wait for the cluster status normal or degrade.Create checkpoint before switching.Start to wait for om_monitor.Switching DN processes.Switch DN processes for rolling upgrade.Ready to grey start cluster.Grey start cluster successfully.Wait for the cluster status normal or degrade.Successfully switch all process versionThe nodes ['opengauss-db1', 'opengauss-db2'] have been successfully upgraded to new version. Then do health check.Start to do health check.Successfully checked cluster status.Waiting for the cluster status to become normal..The cluster status is normal.Upgrade main process has been finished, user can do some check now.Once the check done, please execute following command to commit upgrade:
gs_upgradectl -t commit-upgrade -X /opt/software/openGauss/cluster_config.xml
Successfully upgrade all nodes.
-- 升级提交[omm@opengauss-db1 ~]$ gs_upgradectl -t commit-upgrade -X /opt/software/openGauss/cluster_config.xml Wait for the cluster status normal or degrade.Start check CMS parameter.Old cluster version number less than 92574.Start to do health check.Successfully checked cluster status.Wait for the cluster status normal or degrade.Wait for the cluster status normal or degrade.Start check CMS parameter.Old cluster version number less than 92574.Successfully cleaned old install path.Commit upgrade succeeded.Start check CMS parameter.Old cluster version number less than 92574.
复制代码


3.3 信息核查

3.3.1 查看版本信息

# omm用户执行【任一节点】-- 查看版本信息-- 版本信息为 5.0.0[omm@opengauss-db1 ~]$ gs_om -Vgs_om (openGauss OM 5.0.0 build 244a7e05) compiled at 2023-03-29 03:22:22 commit 0 last mr
-- 查看两节点数据库版本信息,都已升级到5.0.0[omm@opengauss-db1 ~]$ gs_ssh -c "gsql -V"Successfully execute command on all nodes.
Output:[SUCCESS] opengauss-db1:gsql (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr [SUCCESS] opengauss-db2:gsql (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr
复制代码

3.3.2 查看集群状态信息

# omm用户执行【任一节点】-- 集群状态信息[omm@opengauss-db1 ~]$ gs_om -t status --detail --all[  CMServer State   ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normalredistributing : Nobalanced : Nocurrent_az : AZ_ALL
[ Datanode State ]
node node_ip instance state ------------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Standby Normal2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Primary Normal
-- 可以看到在升级后进行了主备切换
复制代码


3.3.3 查看数据库信息

# omm用户执行【任一节点】[omm@opengauss-db1 ~]$ gs_om -t status --detail --all[  CMServer State   ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normalredistributing : Nobalanced : Nocurrent_az : AZ_ALL
[ Datanode State ]
node node_ip instance state ------------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Standby Normal2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Primary Normal[omm@opengauss-db1 ~]$ [omm@opengauss-db1 ~]$ [omm@opengauss-db1 ~]$ gsql -d postgres -p 26000gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr )Non-SSL connection (SSL connection is recommended when requiring high-security)Type "help" for help.
openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;ERROR: cannot execute CREATE DATABASE in a read-only transaction-- 因为发生了主备切换,连接备节点无法创建数据库
复制代码

四、附录

4.1 需修改 version.cfg 属主和属组

执行升级前,应同时修改主备节点/opt/software/openGauss/version.cfg 属主和属组,如未修改,执行升级会报错。


-- 如未修改主备节点version.cfg属主和属组,执行升级时会报如下错误[omm@opengauss-db1 ~]$ /opt/software/openGauss/script/gs_upgradectl -t auto-upgrade --grey -X /opt/software/openGauss/cluster_config.xml[Errno 13] Permission denied: '/opt/software/openGauss/version.cfg'[Errno 13] Permission denied: '/opt/software/openGauss/version.cfg'Start check CMS parameter.float() argument must be a string or a number, not 'NoneType'
复制代码

4.2 修改网卡 MTU 可能导致主备节点间无法 SSH

在升级前预检查时,如果修改了主备节点网卡的 MTU,在执行 gs_upgradectl 会卡主导致升级报错,此时两个节点间无法通过 SSH 互联,虽然可以互相 ping 通。


解决办法是将 MTU 值调整为默认 1500,重启 SSH 服务


-- 升级预检查提示主备节点MTU值需调整,从1500调整到8192,但修改网卡MTU后执行gs_upgradectl升级卡主,最后报错,从升级日志里可看到如下相关信息:[2023-07-21 22:45:39.414838][20984][gs_sshexkey][DEBUG]:Successfully to add id_rsa in ssh-agent[2023-07-21 22:45:39.415698][20984][gs_sshexkey][DEBUG]:Ssh agent register successfully.[2023-07-21 22:45:39.416461][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step5]:Successfully created the local key files.[2023-07-21 22:45:39.417283][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Appending local ID to authorized_keys.[2023-07-21 22:45:39.418192][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Successfully appended local ID to authorized_keys.[2023-07-21 22:45:39.429370][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Updating the known_hosts file.[2023-07-21 22:45:40.311033][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Successfully updated the known_hosts file.[2023-07-21 22:45:40.311665][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Appending authorized_key on the remote node.[2023-07-21 22:45:40.679766][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.156Successfully appended authorized_key on remote node 10.110.3.156.[2023-07-21 22:45:40.864480][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.155Successfully appended authorized_key on remote node 10.110.3.155.[2023-07-21 22:45:40.921407][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Successfully appended authorized_key on all remote node.[2023-07-21 22:45:40.921956][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Checking common authentication file content.[2023-07-21 22:45:40.927562][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Successfully checked common authentication content.[2023-07-21 22:45:40.928391][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step10]:Distributing SSH trust file to all node.[2023-07-21 22:47:41.046988][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 3, retry again.[2023-07-21 22:47:41.047776][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, [2023-07-21 22:47:41.089878][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh total 32drwx------   2 root root 4096 Jul 21 22:45 .dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..-rw-------   1 root root  504 Jul 21 22:45 authorized_keys-rw-------   1 root root  464 Jul 21 22:45 id_om-rw-------   1 root root  100 Jul 21 22:45 id_om.pub-rw-------   1 root root 1679 Jul 21 11:35 id_rsa-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub-rw-------   1 root root 1012 Jul 21 22:45 known_hosts[2023-07-21 22:49:51.205162][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 2, retry again.[2023-07-21 22:49:51.206276][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, [2023-07-21 22:49:51.240173][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh total 32drwx------   2 root root 4096 Jul 21 22:45 .dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..-rw-------   1 root root  504 Jul 21 22:45 authorized_keys-rw-------   1 root root  464 Jul 21 22:45 id_om-rw-------   1 root root  100 Jul 21 22:45 id_om.pub-rw-------   1 root root 1679 Jul 21 11:35 id_rsa-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub-rw-------   1 root root 1012 Jul 21 22:45 known_hosts[2023-07-21 22:52:01.367717][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 1, retry again.[2023-07-21 22:52:01.368465][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, [2023-07-21 22:52:01.425251][20984][gs_sshexkey][DEBUG]:check os info: drwx------   2 root root  4096 Jul 21 22:45 .ssh-rwxr-xr-x   1 root root   885 Dec 12  2022 ssh_key.sh-rw-r--r--   1 root root   521 Jul 21 11:36 sshtrust.sh total 32drwx------   2 root root 4096 Jul 21 22:45 .dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..-rw-------   1 root root  504 Jul 21 22:45 authorized_keys-rw-------   1 root root  464 Jul 21 22:45 id_om-rw-------   1 root root  100 Jul 21 22:45 id_om.pub-rw-------   1 root root 1679 Jul 21 11:35 id_rsa-rw-------   1 root root  400 Jul 21 11:35 id_rsa.pub-rw-------   1 root root 1012 Jul 21 22:45 known_hosts[2023-07-21 22:54:11.538969][20984][gs_sshexkey][ERROR]:[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error: 1, lost connection[2023-07-21 22:54:12.110072][20463][gs_preinstall][DEBUG]:The $GAUSSHOME/bin is exist.[2023-07-21 22:54:12.111040][20463][gs_preinstall][DEBUG]:The $GAUSS_ENV is 2.[2023-07-21 22:54:12.111678][20463][gs_preinstall][DEBUG]:There is the upgrade is in progress.[2023-07-21 22:54:12.112467][20463][gs_preinstall][DEBUG]:In upgrade process, no need to delete /opt/gaussdb/install/om.[2023-07-21 22:54:12.113237][20463][gs_preinstall][ERROR]:[GAUSS-51632] : Failed to do gs_sshexkey.Error: Please enter password for current user[root].Checking network information.All nodes in the network are Normal.Successfully checked network information.Creating SSH trust.Creating the local key file.Successfully created the local key files.Appending local ID to authorized_keys.Successfully appended local ID to authorized_keys.Updating the known_hosts file.Successfully updated the known_hosts file.Appending authorized_key on the remote node.Successfully appended authorized_key on all remote node.Checking common authentication file content.Successfully checked common authentication content.Distributing SSH trust file to all node.[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error: 1, lost connection  -- 此时查看主备节点SSH状态也是异常 [root@opengauss-db2 ~]# systemctl status sshd.service● sshd.service - OpenSSH server daemon   Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)   Active: active (running) since Fri 2023-07-21 11:03:03 CST; 12h ago     Docs: man:sshd(8)           man:sshd_config(5) Main PID: 2160 (sshd)    Tasks: 1   Memory: 4.2M   CGroup: /system.slice/sshd.service           └─2160 /usr/sbin/sshd -D
Jul 21 17:44:20 opengauss-db2 sshd[6374]: Accepted publickey for root from 10.110.3.155 port 63717 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8EJul 21 17:44:22 opengauss-db2 sshd[6417]: Accepted publickey for root from 10.110.3.155 port 63721 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8EJul 21 17:44:24 opengauss-db2 sshd[6463]: Accepted publickey for root from 10.110.3.155 port 63723 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8EJul 21 22:45:32 opengauss-db2 sshd[4829]: Accepted password for root from 10.110.3.155 port 30166 ssh2Jul 21 22:45:37 opengauss-db2 sshd[4883]: Accepted password for root from 10.110.3.155 port 30172 ssh2Jul 21 22:45:39 opengauss-db2 sshd[4922]: Connection closed by 10.110.3.155 port 30178 [preauth]Jul 21 22:45:39 opengauss-db2 sshd[4928]: Connection closed by 10.110.3.155 port 30182 [preauth]Jul 21 22:45:40 opengauss-db2 sshd[4930]: Accepted password for root from 10.110.3.155 port 30188 ssh2Jul 21 23:06:46 opengauss-db2 sshd[13949]: Connection closed by 10.110.3.156 port 50810 [preauth]Jul 21 23:27:22 opengauss-db2 sshd[22723]: Connection closed by 10.110.3.155 port 31050 [preauth]
-- 重新调整MTU,重启主备节点SSH服务[root@opengauss-db2 ~]# systemctl restart sshd.service[root@opengauss-db2 ~]# systemctl status sshd.service ● sshd.service - OpenSSH server daemon Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2023-07-21 23:33:29 CST; 1s ago Docs: man:sshd(8) man:sshd_config(5) Main PID: 25303 (sshd) Tasks: 1 Memory: 1.3M CGroup: /system.slice/sshd.service └─25303 /usr/sbin/sshd -D
Jul 21 23:33:28 opengauss-db2 systemd[1]: Starting OpenSSH server daemon...Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on 0.0.0.0 port 60002.Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on :: port 60002.Jul 21 23:33:29 opengauss-db2 systemd[1]: Started OpenSSH server daemon.Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on 0.0.0.0 port 22.Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on :: port 22.
复制代码

4.3 python3 故障导致无法正常查看集群状态

-- 如果安装的python3故障,会导致gs_om无法查看集群状态[omm@opengauss-db1 ~]$ gs_om -t status --detail --all-bash: /opt/gaussdb/install/om/script/gs_om: Permission denied
复制代码

4.4 集群升级后会发生主备切换

集群升级后导致主备节点发生切换,若连接原主库数据库会导致无法写入


-- 集群升级前状态信息[omm@opengauss-db1 dn]$ gs_om -t status --detail --all[  CMServer State   ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normalredistributing : Nobalanced : Nocurrent_az : AZ_ALL
[ Datanode State ]
node node_ip instance state ------------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Standby Normal2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Primary Normal
-- 集群升级后状态信息[omm@opengauss-db1 ~]$ gs_om -t status --detail --all[ CMServer State ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normalredistributing : Nobalanced : Nocurrent_az : AZ_ALL
[ Datanode State ]
node node_ip instance state ------------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Standby Normal2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Primary Normal
-- 连接原来的主库无法创建数据库[omm@opengauss-db1 ~]$ gsql -d postgres -p 26000gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr )Non-SSL connection (SSL connection is recommended when requiring high-security)Type "help" for help.
openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;ERROR: cannot execute CREATE DATABASE in a read-only transaction
-- 连接新主节点可以正常创建数据库[omm@opengauss-db2 ~]$ gsql -d postgres -p 26000gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr )Non-SSL connection (SSL connection is recommended when requiring high-security)Type "help" for help.
openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;CREATE DATABASEopenGauss=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+-------+-----------+---------+-------+------------------- gaussdb | omm | UTF8 | C | C | postgres | omm | SQL_ASCII | C | C | template0 | omm | SQL_ASCII | C | C | =c/omm + | | | | | omm=CTc/omm template1 | omm | SQL_ASCII | C | C | =c/omm + | | | | | omm=CTc/omm(4 rows)[root@opengauss-db1 ~]# python3 /opt/software/openGauss/script/gs_preinstall -U omm -G dbgrp -X /opt/software/openGauss/cluster_config.xmlParsing the configuration file.Successfully parsed the configuration file.Installing the tools on the local node.Successfully installed the tools on the local node.Are you sure you want to create trust for root (yes/no)?noSetting host ip env[GAUSS-51400] : Failed to execute the command: sed -i '/^export[ ]*HOST_IP=/d' /etc/profile. Result:{'opengauss-db1': 'Success', 'opengauss-db2': 'Failure'}.Error:[SUCCESS] opengauss-db1:[FAILURE] opengauss-db2:

[2023-07-21 22:45:39.414838][20984][gs_sshexkey][DEBUG]:Successfully to add id_rsa in ssh-agent[2023-07-21 22:45:39.415698][20984][gs_sshexkey][DEBUG]:Ssh agent register successfully.[2023-07-21 22:45:39.416461][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step5]:Successfully created the local key files.[2023-07-21 22:45:39.417283][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Appending local ID to authorized_keys.[2023-07-21 22:45:39.418192][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Successfully appended local ID to authorized_keys.[2023-07-21 22:45:39.429370][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Updating the known_hosts file.[2023-07-21 22:45:40.311033][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Successfully updated the known_hosts file.[2023-07-21 22:45:40.311665][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Appending authorized_key on the remote node.[2023-07-21 22:45:40.679766][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.156Successfully appended authorized_key on remote node 10.110.3.156.[2023-07-21 22:45:40.864480][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.155Successfully appended authorized_key on remote node 10.110.3.155.[2023-07-21 22:45:40.921407][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Successfully appended authorized_key on all remote node.[2023-07-21 22:45:40.921956][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Checking common authentication file content.[2023-07-21 22:45:40.927562][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Successfully checked common authentication content.[2023-07-21 22:45:40.928391][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step10]:Distributing SSH trust file to all node.[2023-07-21 22:47:41.046988][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 3, retry again.[2023-07-21 22:47:41.047776][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, [2023-07-21 22:47:41.089878][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh total 32drwx------ 2 root root 4096 Jul 21 22:45 .dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..-rw------- 1 root root 504 Jul 21 22:45 authorized_keys-rw------- 1 root root 464 Jul 21 22:45 id_om-rw------- 1 root root 100 Jul 21 22:45 id_om.pub-rw------- 1 root root 1679 Jul 21 11:35 id_rsa-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub-rw------- 1 root root 1012 Jul 21 22:45 known_hosts[2023-07-21 22:49:51.205162][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 2, retry again.[2023-07-21 22:49:51.206276][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, [2023-07-21 22:49:51.240173][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh total 32drwx------ 2 root root 4096 Jul 21 22:45 .dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..-rw------- 1 root root 504 Jul 21 22:45 authorized_keys-rw------- 1 root root 464 Jul 21 22:45 id_om-rw------- 1 root root 100 Jul 21 22:45 id_om.pub-rw------- 1 root root 1679 Jul 21 11:35 id_rsa-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub-rw------- 1 root root 1012 Jul 21 22:45 known_hosts[2023-07-21 22:52:01.367717][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 1, retry again.[2023-07-21 22:52:01.368465][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection, [2023-07-21 22:52:01.425251][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh total 32drwx------ 2 root root 4096 Jul 21 22:45 .dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..-rw------- 1 root root 504 Jul 21 22:45 authorized_keys-rw------- 1 root root 464 Jul 21 22:45 id_om-rw------- 1 root root 100 Jul 21 22:45 id_om.pub-rw------- 1 root root 1679 Jul 21 11:35 id_rsa-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub-rw------- 1 root root 1012 Jul 21 22:45 known_hosts[2023-07-21 22:54:11.538969][20984][gs_sshexkey][ERROR]:[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error: 1, lost connection[2023-07-21 22:54:12.110072][20463][gs_preinstall][DEBUG]:The $GAUSSHOME/bin is exist.[2023-07-21 22:54:12.111040][20463][gs_preinstall][DEBUG]:The $GAUSS_ENV is 2.[2023-07-21 22:54:12.111678][20463][gs_preinstall][DEBUG]:There is the upgrade is in progress.[2023-07-21 22:54:12.112467][20463][gs_preinstall][DEBUG]:In upgrade process, no need to delete /opt/gaussdb/install/om.[2023-07-21 22:54:12.113237][20463][gs_preinstall][ERROR]:[GAUSS-51632] : Failed to do gs_sshexkey.Error: Please enter password for current user[root].Checking network information.All nodes in the network are Normal.Successfully checked network information.Creating SSH trust.Creating the local key file.Successfully created the local key files.Appending local ID to authorized_keys.Successfully appended local ID to authorized_keys.Updating the known_hosts file.Successfully updated the known_hosts file.Appending authorized_key on the remote node.Successfully appended authorized_key on all remote node.Checking common authentication file content.Successfully checked common authentication content.Distributing SSH trust file to all node.[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error: 1, lost connection[omm@opengauss-db1 ~]$ gs_om -t status --detail --all[ CMServer State ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Down2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Down
cm_ctl: can't connect to cm_server.Maybe cm_server is not running, or timeout expired. Please try again.[omm@opengauss-db1 ~]$ cm_ctl switchover -acm_ctl: send switchover msg to cm_server, connect fail node_id:0, data_path:.[omm@opengauss-db1 ~]$ cm_ctl query -Cv[ CMServer State ]
node instance state---------------------------------1 opengauss-db1 1 Primary2 opengauss-db2 2 Standby
[ Cluster State ]
cluster_state : Normalredistributing : Nobalanced : Yescurrent_az : AZ_ALL
[ Datanode State ]
node instance state | node instance state------------------------------------------------------------------------------------------1 opengauss-db1 6001 P Primary Normal | 2 opengauss-db2 6002 S Standby Normal[omm@opengauss-db1 ~]$ gs_om -t status --detail --all[ CMServer State ]
node node_ip instance state-------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normalredistributing : Nobalanced : Yescurrent_az : AZ_ALL
[ Datanode State ]
node node_ip instance state ------------------------------------------------------------------------------------1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Primary Normal2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Standby Normal```![image20230721172148201.png](https://oss-emcsprod-public.modb.pro/image/editor/20230722-c4e42479-422f-4161-979d-2fba202f7337.png)
复制代码


用户头像

daydayup

关注

还未添加个人签名 2023-07-18 加入

还未添加个人简介

评论

发布
暂无评论
【我和openGauss的故事】openGauss 3.1.1企业版主备集群升级至5.0.0操作指南_daydayup_InfoQ写作社区