首页 > 其他 > 详细

TBase安装部署与演示

时间:2020-12-07 18:19:24      阅读:78      评论:0      收藏:0      [点我收藏+]

前言

通过本文的学习,您将可以
1. 了解TBase对部署环境的要求
2. 掌握部署TBase前需要对操作系统做的准备性工作
3. 掌握TBase的安装部署操作
4. 掌握TBase实例初始化
5. 了解TBase安装中的其他注意事项

第一章 部署环境介绍

1.1 机器配置
技术分享图片

1.2 管理角色设备分配方案
技术分享图片

1.3.1 操作系统部署要求
技术分享图片

技术分享图片

1.4 TBase安装包与许可证的获取
安装TBase,你需要以下文件:
技术分享图片技术分享图片

——以上请联系腾讯商务代表,或在 https://github.com/Tencent/TBase 获取。

技术分享图片

第二章 服务器环境准备

2.1 测试数据盘的随机同步写入性能

time dd if=/dev/zero of=test bs=1M count=1024 oflag=dsync
1024+0 records in 
1024+0 records out
1073741824 bytes (1.1 GB) copied, 17.4811 s, 61.4 MB/s
real	0m17.482s
user	0m0.002s
sys	0m0.748s

2.2 较准机器的时间

  • 创建定时任务前,确保ntp / chrony 服务关闭

技术分享图片

  • 创建计划任务进行时间同步
crontab –e       (写入以下内容)
 --NTP服务器可自行选择
*/20 * * * * /usr/sbin/ntpdate ntpupdate.tencentyun.com >/dev/null & 

2.3 防火墙与selinux配置

  • 关闭SELinux
    setenforce 0

  • 设置SELinux开机不自启动

vi /etc/sysconfig/selinux
将其中的SELINUX= XXXXXX修改为SELINUX=disabled
  • 关闭防火墙&禁止防火墙开机自启
    技术分享图片

2.4 修改limits.conf参数(生产环境)

[root@tbase01]#cat /etc/security/limits.conf
--准许单个进程打开文件数
* soft    nofile  131072  
* hard    nofile  131072  
--准放运行的进程数
* soft    nproc   131072  
* hard    nproc   131072  
--限制内核文件的大小
* soft    core    unlimited  
* hard    core    unlimited  

2.5 修改内核参数sysctl.conf(生产环境)

[root@tbase01]#  vi /etc/sysctl.conf
--所有在消息队列中的消息总和的最大值(msgmnb=64k)
kernel.msgmnb = 65536
--指定内核中消息队列中消息的最大值(msgmax=64k)
kernel.msgmax = 65536
--共享内存总量,以页为单位(4K/页),默认这个值足够大了
kernel.shmall = 4294967296 
--单个共享内存段的最大值 ,这个值需要大到让应用的内存段不用分成多个创建,否则性能会下降
kernel.shmmax=137438953472  
--可以创建的内存段数 ,默认这个值足够大了
kernel.shmmni = 4096 
--4个数据分别对应:SEMMSL、SEMMNS、SEMOPM、SEMMNI这四个核心参数
--SEMMSL :用于控制每个信号集的最大信号数量,建议最小值为postgresql最大连接数+10
--SEMOPM: 每个 semop 系统调用可以执行的信号操作的数量。建议设置 SEMOPM 等于SEMMSL 。
--SEMMNI :内核参数用于控制整个 Linux 系统中信号集的最大数量,建议128
--SEMMNS:用于控制整个 Linux 系统中信号(而不是信号集)的最大数,为SEMMSL * SEMMNI

kernel.sem = 50100 64128000 50100 1280

--准许系统所有进程一共可以打开的文件数量
fs.file-max = 6553600
--系统允许的最大异步IO队列长度
fs.aio-max-nr = 1048576
--增大locale port数量,默认是从32768开始
net.ipv4.ip_local_port_range = 1024 65535

--keepalive认定探测失效时间,默认是2小时,7200秒
net.ipv4.tcp_keepalive_time = 60   
--在认定连接失效之前,发送多少个TCP的keepalive探测包。缺省值是9
net.ipv4.tcp_keepalive_probes = 6  
--每隔多少秒重新发送keepalive探测包,默认是75秒 
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.ip_forward=1
vm.dirty_background_bytes = 102400000

2.6 主机名与IP配置

  • 修改主机名
    hostnamectl set-hostname XXXXXX

  • IP地址配置
    IP地址无特别要求,可以存在多个网段,只能保证网络是连通的就行。

  • 带宽要求
    同IDC,同城最好是万兆,并且要求是低于3ms的延迟。
    异地视要同步数据量要求,在做备机时需要比较高的带宽,延迟按设计是30ms左右。

2.7 /etc/ssh/sshd_config配置

  • 修改/etc/ssh/sshd_config配置
vi /etc/ssh/sshd_config
UseDNS no
  • 修改完成后重启sshd服务
CentOS 6 版本service sshd restart
CentOS 7 版本systemctl restart sshd

第三章 应用程序部署
3.1 上传安装包并解压

  • 上传到/root目录下
[root@TBASE01~]# pwd
/root
[root@TBASE01~]# unzip package_name.zip
  • 如果安装包是tar.gz包,则需要这样解压
    tar -zxf package_name.tar.gz

3.2 配置安装选项

[root@TBASE01~ ]# vi  tbase_mgr/conf/role.info 
#eth ip idc_name root password role( OssCenterMaster OssCenterSlave OssAgent Confdb Alarm TStudio Zookeeper)
#Notice: The machine that will be installed OssCenterMaster or OssCenterSlave can‘t be installed OssAgent

ens33 192.168.43.129 idc_1 root oracle OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.130 idc_1 root oracle OssCenterSlave|Confdb|Alarm

--Etcd组件是confdb倒换和center倒换的容灾组件,Etcd部署个数必须是奇数个
--Etcd建议跟center或者confdb部署在一起,OssCenter支持一主多备。如果机器数量有足够多,除部署以上所示组件的两台机器,其他机器可以只部署OssAgent即可
--(建议最少部署3个Etcd组件,可靠性更高)。

3.3 运行安装程序

[root@TBASE01tbase_mgr]# ./tbase_mgr.sh install

Hey, Welcome and now we will install TBase OSS by the flowing steps: 
        0.  Check role configuration read from conf/role.info ... 
        1.  Install some base rpm packages such as dos2unix/createrepo/expect and so on ... 
        2.  Check root password and do some initalization on all machines read from conf/role.info ... 
        3.  Check package requires ... 
        4.  Create OS user tbase specifided in conf/oss/oss_init.conf  ... 
        5.  Create yum repository which store all the rpm packages needed by TBase OSS ... 
        6.  Install and Start ConfdbMaster ... 
        7.  Install and Start all ConfdbSlaves ... 
        8.  Install and Start Alarm Server ... 
        9.  Install and Start Zookeeper If needed ... 
        10. Install and Start OssCenterMaster and all OssCenterSlaves ... 
        11. Install and Start all Agents ... 
        12. Install default tbase_pgxz package... 
        13. Install and Start Tstudio ... 

Ready to continue (Yy|Nn) ? 

3.4 安装完成

       0.  Now start to check role configuration ...


       1.  Now start to install dos2unix/createrepo/expect and so on ...


       2.  Now start to check root password and do some initalization on all machines ...


       3.  Now start to check package requires ...


       4.  Now start to create OS user tbase ...


       5.  Now start to create yum repository ...


       6.  Now start to install Etcd ...


       7.  Now start to install all Confdbs ...


       8.  Now start to install Alarm Server ...


       9. Now start to install all OssCenters ...


       10. Now start to install all OssAgents ...


       11. Now start to install default tbase_pgxz...
--[ 2020-12-01 22:39:21 ] install studio


       12. Now start to install TStudio ...


       13. Now start to write etcd keys ...


Successed to install TBase OSS, visit http://192.168.43.129:8080 to continue ...

Successed to install TStudio, visit http://192.168.43.129:5050

default account: postgres@postgres.com password: postgres

3.4 安装失败解决办法

  • 在安装过程中如果遇到失败,先查看tbase_mgr/logs文件夹下的日志文件
[root@TBASE01logs]# cd /root/tbase_mgr/logs
[root@TBASE01logs]# ll
total 780
-rw------- 1 root root    784 Dec  1 22:30 authorized_keys.root
-rw-r--r-- 1 root root    392 Dec  1 22:30 authorized_keys.root.192.168.43.129
-rw-r--r-- 1 root root    392 Dec  1 22:30 authorized_keys.root.192.168.43.130
-rw-r--r-- 1 root root 524730 Dec  1 22:39 tbase_mgr.sh.info.log.2020-12-01
-rw-r--r-- 1 root root  38726 Dec  4 17:07 tbase_mgr.sh.info.log.2020-12-04
-rw-r--r-- 1 root root  70527 Dec  5 22:10 tbase_mgr.sh.info.log.2020-12-05
-rw-r--r-- 1 root root   4622 Dec  6 00:07 tbase_mgr.sh.info.log.2020-12-06
-rw-r--r-- 1 root root 133901 Dec  7 15:11 tbase_mgr.sh.info.log.2020-12-07
  • 在提示查看的日志文件的末尾,有报错中的具体执行语句,手动执行即可显示详细的错误信息

  • 解决后再次执行安装程序即可
    [root@TBASE01TBase_mgr]# sh tbase_mgr.sh install

  • docker版本与内核冲突
    如果docker版本与内核冲突,我们可以忽略安装docker,这样的Tstudio组件也要放弃安装
    以下是修改方法:
    编辑tbase_mgr.sh文件,修改以下内容:

1)把tstudio_install  > /dev/null 2>&1注释掉
2)把yum install -y dos2unix expect lrzsz net-tools lsof createrepo httpd httpd-tools php docker中“docker”删除

如果一定要安装tstduio,找到当前内核版本下的docker,安装好后,再独立部署tstudio组件。

3.5 yum安装源恢复方法

[root@TBase_mgr]# rm -rf /etc/yum.repos.d/PGXZ.repo
[root@TBASE01TBase_mgr]# mv /etc/yum.repos.d/backup/* /etc/yum.repos.d/
[root@TBASE01TBase_mgr]# yum makecache
[root@TBASE01TBase_mgr]# yum clean all

所有机器都要恢复,如不再需要安装其它软件,可以不需要恢复

第四章 安装失败问题汇总

4.1 IDC参数写为hostname错误

root@tbase01 tbase_mgr]# cat logs/tbase_mgr.sh.info.log.2020-09-25     
[ 2020-09-25 11:08:33 ] [ read_machine_info ] [ERROR]: Agent must be specified, exit
[ 2020-09-25 11:08:33 ] [ check_machine_idc ] [ERROR]: The machine_ip in (tbase01), but the tbase01 is not exist.
[ 2020-09-25 11:08:33 ] [ check_machine_idc ] [ERROR]: Fail to check machine idc in idc_1:local, exit
[ 2020-09-25 11:08:33 ] [ check_progress ] [ERROR]: from [read_machine_info] Abort progress due to an error occurs during step above, check logs/tbase_mgr.sh.info.log.2020-09-25 for more details ...

解决方法

--这里idc_1在OSS配置文件中写死,不建议修改
查看tbase_mgr/conf/role.info配置文件里的idc配置,不是hostname
[root@cdh01 tbase_mgr]# cat conf/role.info
eth0 192.168.43.129 idc_1 root xxxxxxxx OssCenterMaster|Confdb|TStudio|Etcd
eth0 192.168.43.130 idc_1 root xxxxxxxx OssCenterSlave|Confdb|Alarm

2、安装日志报yum源问题

  • httpd权限配置问题
Other repos take up 737 M of disk space (use --verbose for details)
[ 2020-09-25 11:26:09 ] [ conf_yum_repo ] [ERROR]: Failed to exec: ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.199 "yum -y install dos2unix expect readline createrepo net-tools lsof uuid" < /dev/null, retcode: 1, retmsg: Loaded plugins: fastestmirror, langpacks
Determining fastest mirrors, exit
[ 2020-09-25 11:26:09 ] [ check_progress ] [ERROR]: from [conf_yum_repo] Abort progress due to an error occurs during step above, check logs/tbase_mgr.sh.info.log.2020-09-25 for more details ...

解决方法

1、通过执行日志中的命令,发现yum源因权限问题无法访问
http://192.168.43.199/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidde

2、修改/etc/httpd/conf/httpd.conf
将denied改为granted
<Directory />
    AllowOverride none
    Require all granted
</Directory>
  • 安装包冲突问题
Error: Package: policycoreutils-python-2.5-11.el7_3.x86_64 (PGXZ)
           Requires: policycoreutils = 2.5-11.el7_3
           Installed: policycoreutils-2.5-29.el7.x86_64 (@anaconda)
               policycoreutils = 2.5-29.el7
           Available: policycoreutils-2.5-11.el7_3.x86_64 (PGXZ)
               policycoreutils = 2.5-11.el7_3
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest

解决方法

安装包冲突,需要删除原来的包
yum -y remove policycoreutils-2.5-29.el7.x86_64

3、 Etcd数量为偶数的问题

[ 2020-12-01 15:41:59 ]  [parse_machine_role] [Info]: 192.168.43.130, OssCenterSlave|Confdb|Alarm|Etcd r=[Alarm]
[ 2020-12-01 15:41:59 ]  [parse_machine_role] [Info]: 192.168.43.130, OssCenterSlave|Confdb|Alarm|Etcd r=[Etcd]
[ 2020-12-01 15:41:59 ] [ parse_machine_role ] [Info]: XZEtcds 192.168.43.130 r=[Etcd]
[ 2020-12-01 15:41:59 ] [ read_machine_info ] [ERROR]: Agent must be specified, exit
[ 2020-12-01 15:41:59 ] [ read_machine_info ] [ERROR]: The number of Etcd must be a odd number. etcd_cnt=2, etcd_server=192.168.43.129
[ 2020-12-01 15:41:59 ] [ check_progress ] [ERROR]: from [read_machine_info] Abort progress due to an error occurs during step above, check logs/tbase_mgr.sh.info.log.2020-12-01 for more details ...

解决方案

查看role.info,发现Etcd配置了两个,必须为奇数
[root@cdh01 tbase_mgr]# cat conf/role.info
ens33 192.168.43.129 cdh root oracle OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.130 cdh02 root oracle OssCenterSlave|Confdb|Alarm|Etcd  --删掉Etcd组件

4、root密码错误

[ 2020-12-01 15:51:32 ] [ selinux_policy_off ] [INFO]: spawn_pswd=oracle1
[ 2020-12-01 15:51:32 ] [ selinux_policy_off ] [INFO]: after spawn_pswd=oracle1
[ 2020-12-01 15:51:34 ] [ selinux_policy_off ] [ERROR]: Failed to exec: ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.129 "setenforce 0;iptables -I INPUT -j ACCEPT;service iptables save", retcode: 3, exit

解决方案

查看role.info,发现root密码错误 
[root@cdh01 tbase_mgr]# cat conf/role.info
ens33 192.168.43.129 idc_1 root oracle1 OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.130 idc_1 root oracle2 OssCenterSlave|Confdb|Alarm

5、网卡参数错误
[ 2020-12-01 16:00:46 ] [ check_net_eth ] [ERROR]: 192.168.43.129 not match 1ens33

解决方案

查看role.info,发现网络端口不存在
[root@cdh01 tbase_mgr]# cat conf/role.info
1ens33 192.168.43.129 idc_1 root oracle1 OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.130 idc_1 root oracle2 OssCenterSlave|Confdb|Alarm

6、IP错误

测试发现,这步不影响YUM安装软件
[ 2020-12-01 16:04:33 ] [ selinux_policy_off ] [ERROR]: Failed to exec: ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.131 "sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/‘ /etc/selinux/config", retcode: 5, exit  

解决方案

查看role.info,发现IP
[root@cdh01 tbase_mgr]# cat conf/role.info
ens33 192.168.43.129 idc_1 root oracle OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.131 idc_1 root oracle OssCenterSlave|Confdb|Alarm

7、不进入 tbase_mgr目录执行脚本

sh tbase_mgr/tbase_mgr.sh install日志报错:This system is not registered with an entitlement server. You can use subscription-manager to register.Determining fastest mirrors, exit

解决方法

cd  tbase_mgr
sh tbase_mgr.sh install

8、docker安装失败


[ 2020-12-01 20:28:02 ] [ tstudio_install ] [ERROR]: Failed to exec: ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.129 "service docker start && sleep 10 && docker info" < /dev/null, retcode: 1, retmsg: , exit
[ 2020-12-01 20:28:02 ] [ check_progress ] [ERROR]: from [tstudio_install] Abort progress due to an error occurs during step above, check logs/tbase_mgr.sh.info.log.2020-12-01 for more details ...

解决

测试上述命令,发现docker进程启动失败

ils ...
[root@cdh tbase_mgr]# ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.129 "service docker start && sleep 10 && docker info" < /dev/null
Redirecting to /bin/systemctl start docker.service
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[root@cdh tbase_mgr]# systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2020-12-01 20:30:19 CST; 11s ago
     Docs: http://docs.docker.com
  Process: 5490 ExecStart=/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY (code=exited, status=1/FAILURE)
Main PID: 5490 (code=exited, status=1/FAILURE)


Dec 01 20:30:17 cdh systemd[1]: Starting Docker Application Container Engine...
Dec 01 20:30:17 cdh dockerd-current[5490]: time="2020-12-01T20:30:17.953932215+08:00" level=info msg="libcontainerd: new containerd process, pid: 5497"
Dec 01 20:30:18 cdh dockerd-current[5490]: time="2020-12-01T20:30:18.965695547+08:00" level=warning msg="devmapper: Usage of loopback devices ...section."
Dec 01 20:30:19 cdh dockerd-current[5490]: time="2020-12-01T20:30:19.006156809+08:00" level=warning msg="devmapper: Base device already exists...ignored."
Dec 01 20:30:19 cdh dockerd-current[5490]: time="2020-12-01T20:30:19.021358030+08:00" level=fatal msg="Error starting daemon: error initializi...DRIVER>)"
Dec 01 20:30:19 cdh systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Dec 01 20:30:19 cdh systemd[1]: Failed to start Docker Application Container Engine.
Dec 01 20:30:19 cdh systemd[1]: Unit docker.service entered failed state.
Dec 01 20:30:19 cdh systemd[1]: docker.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

journalctl -xe

--详细日志
-- Unit docker-storage-setup.service has begun starting up.
Dec 01 20:38:48 cdh docker-storage-setup[14921]: ERROR: Docker has been previously configured for use with devicemapper graph driver. Not creating a new t
Dec 01 20:38:48 cdh systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
Dec 01 20:38:48 cdh docker-storage-setup[14921]: INFO: Docker state can be reset by stopping docker and by removing /var/lib/docker directory. This will d
Dec 01 20:38:48 cdh systemd[1]: Failed to start Docker Storage Setup.
-- Subject: Unit docker-storage-setup.service has failed

Dec  1 20:28:01 cdh docker-storage-setup: ERROR: Docker has been previously configured for use with devicemapper graph driver. Not creating a new thin pool as existing docker metadata will fail to work with it. Manual cleanup is required before this will succeed.
Dec  1 20:28:01 cdh docker-storage-setup: INFO: Docker state can be reset by stopping docker and by removing /var/lib/docker directory. This will destroy existing docker images and containers and all the docker metadata.
Dec  1 20:28:01 cdh systemd: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
Dec  1 20:28:01 cdh systemd: Failed to start Docker Storage Setup.
Dec  1 20:28:01 cdh systemd: Unit docker-storage-setup.service entered failed state.

根据上述描述,发现是/var/lib/docker/已经存在,删除就可以。这个情况是我之前安装使用过DOCKER导致的

[root@cdh tbase_mgr]# rm -rf /var/lib/docker/
[root@cdh tbase_mgr]#

[root@cdh tbase_mgr]# ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.129 "service docker start && sleep 10 && docker info" < /dev/null
Redirecting to /bin/systemctl start docker.service
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-253:0-69112218-pool
Pool Blocksize: 65.54 kB

TBase安装部署与演示

原文:https://www.cnblogs.com/dotalf/p/14098083.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!