DC:Designated Coordinator 指派的协调员 V2开始引入
1、集群的组成结构
HA Cluster:
Messaging and Infrastructure Layer|Heartbeat Layer 集群信息事务层
Membership Layer 集群成员关系层
CCM 投票系统
Resource Allocation Layer 资源分配层
CRM,
DC:LRM,PE,TE,CIB
Other:LRM,CIB
Resource Layer 资源代理
RA
共享存储:
NAS:Network Attached Storage 文件系统级别共享
SAN:Storage Area Network 块级别共享
集群文件系统: (支持的节点不多,最多16个)
GFS2, OCFS2,cLVM2
corosync:
AIS: Application Interface Standard, 应该接口标准
SA Forum: OpenAIS
OpenAIS: 提供了一种集群模式,包含集群框架、集群成员管理、通信方式、集群监测,但没有集群资源管理功能;
组件包括:AMF, CLM, CPKT, EVT等;分支不同,包含的组件略有区别;
分支:picacho, whitetank, wilson,
corosync (集群管理引擎)
只是openais的一个子组件;
分裂成为两个项目:
corosync, wilson(ais的接口标准)
CentOS 5:
cman + rgmanager (RHCS 系统自带)
CentOS 6:
cman + rgmanager
corosync + pacemaker
命令行管理工具:
crmsh: suse, CentOS 6.4-自带
pcs: RedHat, CentOS 6.5+自带
1、安装corosync + pacemaker
注意:确定时间同步,集群几点基于hostname命令显示的主机名通信,节点之间的root用户能够基于密钥认证,考虑仲裁设备是否要使用
yum install corosync pacemaker -y [root@BAIYU_175 ~]# rpm -ql corosync /etc/corosync /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf.example.udpu /etc/corosync/service.d /etc/corosync/uidgid.d /etc/dbus-1/system.d/corosync-signals.conf /etc/rc.d/init.d/corosync /etc/rc.d/init.d/corosync-notifyd /etc/sysconfig/corosync-notifyd /usr/bin/corosync-blackbox /usr/libexec/lcrso /usr/libexec/lcrso/coroparse.lcrso /usr/libexec/lcrso/objdb.lcrso /usr/libexec/lcrso/quorum_testquorum.lcrso /usr/libexec/lcrso/quorum_votequorum.lcrso /usr/libexec/lcrso/service_cfg.lcrso /usr/libexec/lcrso/service_confdb.lcrso /usr/libexec/lcrso/service_cpg.lcrso /usr/libexec/lcrso/service_evs.lcrso /usr/libexec/lcrso/service_pload.lcrso /usr/libexec/lcrso/vsf_quorum.lcrso /usr/libexec/lcrso/vsf_ykd.lcrso /usr/sbin/corosync /usr/sbin/corosync-cfgtool /usr/sbin/corosync-cpgtool /usr/sbin/corosync-fplay /usr/sbin/corosync-keygen /usr/sbin/corosync-notifyd /usr/sbin/corosync-objctl /usr/sbin/corosync-pload /usr/sbin/corosync-quorumtool /usr/share/doc/corosync-1.4.7 /usr/share/doc/corosync-1.4.7/LICENSE /usr/share/doc/corosync-1.4.7/SECURITY /usr/share/man/man5/corosync.conf.5.gz /usr/share/man/man8/confdb_keys.8.gz /usr/share/man/man8/corosync-blackbox.8.gz /usr/share/man/man8/corosync-cfgtool.8.gz /usr/share/man/man8/corosync-cpgtool.8.gz /usr/share/man/man8/corosync-fplay.8.gz /usr/share/man/man8/corosync-keygen.8.gz /usr/share/man/man8/corosync-notifyd.8.gz /usr/share/man/man8/corosync-objctl.8.gz /usr/share/man/man8/corosync-pload.8.gz /usr/share/man/man8/corosync-quorumtool.8.gz /usr/share/man/man8/corosync.8.gz /usr/share/man/man8/corosync_overview.8.gz /usr/share/snmp/mibs/COROSYNC-MIB.txt /var/lib/corosync /var/log/cluster
2、配置corosync
[root@BAIYU_173 ~]# cd /etc/corosync/ [root@BAIYU_173 corosync]# ls corosync.conf corosync.conf.example corosync.conf.example.udpu service.d uidgid.d [root@BAIYU_173 corosync]# cp corosync.conf.example corosync.conf
主配置文件/etc/corosync.conf详解:
compatibility: whitetank 兼容whitetank 要使之不兼容注释这行
secauth:off 不打开集群安全认证 推荐打开 on 使用 corosync-keygen 生成密钥
threads: 0 定义多线程工作模式 0表示不使用线程而使用进程
ringnumber:0 环数目,类似ttl 默认即可
bindnetaddr: 192.168.1.0 多播地址监听哪个网络地址,填上自己的网络地址即可:192.168.100.0
mcastaddr: 239.255.1.1 指定多播地址 239.165.17.91
mcastport: 5405 多播使用的端口utp
to_logfile: yes
to_syslog: yes 使用一个文件记录地址即可 off
添加以下内容将pacemaker作为corosync的插件运行:
service { ver: 0 name: pacemaker use_mgmtd: yes } aisexec { user: root group: root }
注意:日志中会有警告信息:
corosync V2中不允许pacemaker作为插件运行:
[root@BAIYU_173 corosync]# corosync-keygen #生成用于集群验证时的密钥 Corosync Cluster Engine Authentication key generator. Gathering 1024 bits for key from /dev/random. Press keys on your keyboard to generate entropy. Writing corosync key to /etc/corosync/authkey. [root@BAIYU_173 corosync]# ls authkey corosync.conf.example corosync.conf.orig uidgid.d corosync.conf corosync.conf.example.udpu service.d
3、将密钥和corosync的配置文件复制给集群其它节点:
[root@BAIYU_173 corosync]# scp -p authkey corosync.conf 192.168.100.175:/etc/corosync
4、启动corosync并验证:
[root@BAIYU_173 corosync]# service corosync start Starting Corosync Cluster Engine (corosync): [确定] [root@BAIYU_173 corosync]# netstat -nlptu Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1455/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1552/master udp 0 0 192.168.100.173:5404 0.0.0.0:* 25324/corosync udp 0 0 192.168.100.173:5405 0.0.0.0:* 25324/corosync udp 0 0 239.255.1.1:5405 0.0.0.0:* 25324/corosync
验证corosync引擎是否正常启动:
[root@BAIYU_173 corosync]# grep -e ‘Corosync Cluster Engine‘ -e ‘configuration file‘ / var/log/cluster/corosync.log Oct 24 17:55:00 corosync [MAIN ] Corosync Cluster Engine (‘1.4.7‘): started and ready to provide service. Oct 24 17:55:00 corosync [MAIN ] Successfully read main configuration file ‘/etc/corosync/corosync.conf‘.
验证初始化成员节点通知是否正常发出:
[root@BAIYU_173 corosync]# grep TOTEM /var/log/cluster/corosync.log Oct 24 17:55:00 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Oct 24 17:55:00 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Oct 24 17:55:00 corosync [TOTEM ] The network interface [192.168.100.173] is now up. Oct 24 17:55:00 corosync [TOTEM ] Process pause detected for 613 ms, flushing membership messages. Oct 24 17:55:00 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 24 17:55:23 corosync [TOTEM ] A processor failed, forming new configuration. Oct 24 17:55:23 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 24 17:55:27 corosync [TOTEM ] A processor failed, forming new configuration. Oct 24 17:55:27 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:下面的错误信息表示pacemaker不久之后讲不再作为corosync的插件运行,因此,建议使用cman作为集群基础架构服务,此处可安全忽略
[root@BAIYU_173 corosync]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_ resources Oct 24 17:55:00 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. Oct 24 17:55:00 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of ‘Clusters from Scratch‘ (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN Oct 24 17:55:02 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process mgmtd exited (pid=25335, rc=100) # 可以忽略, Oct 24 17:55:03 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=25329, core=true) # 可能有很多条
查看pacemaker是否正常启动:
[root@BAIYU_173 corosync]# grep pcmk_startup /var/log/cluster/corosync.log Oct 24 17:55:00 corosync [pcmk ] info: pcmk_startup: CRM: Initialized Oct 24 17:55:00 corosync [pcmk ] Logging: Initialized pcmk_startup Oct 24 17:55:00 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Oct 24 17:55:00 corosync [pcmk ] info: pcmk_startup: Service: 9 Oct 24 17:55:00 corosync [pcmk ] info: pcmk_startup: Local hostname: BAIYU_173
安装crm_sh:
可以只在一个节点上安装crm_sh,配置结果会送给DC,然后DC同步给其它节点
[root@BAIYU_173 ~]# ls 111 anaconda-ks.cfg install.log pssh-2.3.1-2.el6.x86_64.rpm.bak 111.orig crmsh-1.2.6-4.el6.x86_64.rpm install.log.syslog trash.sh [root@BAIYU_173 ~]# yum install pssh-2.3.1-2.el6.x86_64.rpm crmsh-1.2.6-4.el6.x86_64.rpm # yum 安装本地rpm包,自动解决依赖关系 依赖关系解决 ================================================================================================ 软件包 架构 版本 仓库 大小 ================================================================================================ 正在安装: crmsh x86_64 1.2.6-4.el6 /crmsh-1.2.6-4.el6.x86_64 1.7 M pssh x86_64 2.3.1-2.el6 /pssh-2.3.1-2.el6.x86_64 119 k 为依赖而安装: python-dateutil noarch 1.4.1-6.el6 base 84 k python-lxml x86_64 2.2.3-1.1.el6 base 2.0 M 事务概要 ================================================================================================ Install 4 Package(s)
crm的常用子命令:
status
node
configure
ra
resource
configure常用的子命令:
primitive
group
clone
ms
location
colocation
order
show
property
primitive <rsc_id> class:provider:ra params param1=value1 param2=value2 op op1 param1=value op op2 parma1=value1
案例:ha web service
webip: 172.16.100.23
配置两点的corosync/pacemaker集群,设置两个全局属性:
stonith-enabled=false
no-quorum-policy=ignore
原文:http://xiexiaojun.blog.51cto.com/2305291/1705941