Oracle Grid不止能提供自身Oracle Database高可用,还可以为第三方应用提供高可用。
可以为OGG、SharePlex等逻辑复制,Apache等应用提供高可用。
使用Oracle Grid代理第三方应用主要有以下两种方式:
官方文档位置:Clusterware Administration and Deployment GuideThird-Party Applications Using the Script AgentMos文档参考:Oracle_GoldenGate_Best_Practices_-_Oracle_GoldenGate_high_availability_using_Oracle_Clusterware_v8_6_ID1313703_1_.pdfOracle Grid 11.2如果使用oracle添加资源,则日志位置:$GRID_HOME/log/{node_name}/agent/crsd/scriptagent_oracle12c以后GRID日志也变为标准ADR目录$GRID_BASE/diag/crs/crs/agent/scriptagent_oracle.trc# 如果为GRID添加资源,路径或日志名称scriptagent_grid即可。下面测试利用Grid代理第三方脚本形式提供高可用,XAG方式参考官方文档即可。
部署步骤概述:
(1) login as root# appvipcfg create -network=1 \-ip=192.168.204.242 \-vipname=czhvip \-user=root(2) 查看配置vip# crsctl stat res -p |grep -ie .network -ie subnet |grep -ie name -ie subnet(3) login as root# crsctl setperm resource czhvip -u user:oracle:r-x--配置资源使用权限用户,IP资源属主一定必须是root,其他用户无法配置IP,会导致无法启动VIP资源。ogg安装部署不在此赘述,可按照以下几种方式:1. 使用ACFS作为共享磁盘,OGG软件本身以及dir*相关目录均存放于ACFS文件系统。ACFS相应版本以及补丁参考下面文档:ACFS Support On OS Platforms (Certification Matrix). (Doc ID 1369107.1).pdf2.使用ACFS存放goldengate的trail文件等,OGG软件本身存放于操作挂载点即可,通过在操作系统相应路径下建立软链接方式指向ACFS中dir*相应目录$ ln –s /acfs_mount_point/dirdat dirdat3.使用例如ocfs2、gpfs等集群文件系统存放下面脚本仅仅用做示例,实际脚本可以根据不同应用加入相应模块脚本,比如check脚本就需要判断进程状态等等。
Grid 第三方脚本模块说明
1. Grid 11.2脚本需要包含start/stop/clean/check/abort--示例脚本#!/bin/shcase $1 in‘start‘) echo $(date)‘ start‘>>/tmp/crs.log exit 0 ;;‘stop‘) echo $(date)‘ stop‘>>/tmp/crs.log exit 0 ;;‘clean‘) echo $(date)‘ clean‘>>/tmp/crs.log echo $?‘clean‘ >>/tmp/crs.log exit 0 ;;‘check‘) echo "CHECK entry point has been called.." echo $(date)‘ check‘>>/tmp/crs.log exit 0 ;;‘abort‘) echo $(date)‘ abort‘>>/tmp/crs.log exit 0 ;;esac2. 模块说明--主要介绍11gR2引入的两个新模块--12c以后版本引入了更多模块,这点可以从启动日志中看到。CLEANClean was introduced with Oracle Clusterware 11g Release 2. It will not be used for OracleClusterware 10g Release 2 or 11g Release 1. Clean is called when there is a need to clean up theresource. It is a non-graceful operation.ABORTAbort was introduced with Oracle Clusterware 11g Release 2. It will not be used for OracleClusterware 10g Release 2 or 11g Release 1. Abort is called if any of the resource componentshang to abort the ongoing action. Abort is not required to be included.3.关于脚本中变量说明如果start/stop/clean/check/abort对应脚本中启动程序脚本需要依赖环境变量,例如(1)ogg如果extract配置使用本地ORACLE_SID连接数据库进行捕获,不是使用tnsalias方式连接数据库,则ggsci> start extract时,依赖于环境变量ORACLE_SID,这种情况下,需要在上面脚本中定义好依赖的ORACLE_SID以及ORACLE_HOME变量,因为Grid启动时由于vip属主为root,所以如果vip与ogg资源强依赖时,只能获取到root的用户环境变量,无法获得oracle用户环境变量,会导致资源无法正常启动。(2)所以环境变量一定要在脚本中完全定义,不要依赖于外部变量,否则将会发生问题后很难排查以及遇到无法启动资源或启动资源无法启动程序中相应进程。下面为OGG连接ASM与版本关系
如果 Redo Log 存储在 ASM 中,设置 Catpure ASM 连接方式如下:Oracle 10.2.0.5 或 11.2.0.2 之前版本:TRANLOGOPTIONS ASMUSER sys@asminst, asmpassword oracleOracle 10.2.0.5、11.2.0.2 或之以后版本,GoldenGate 为 11g 或以后版本:TRANLOGOPTIONS DBLOGREADER如果在 AIX 平台数据库的 redo log 使用的是 RAW,则可能需要设置参数:TRANLOGOPTIONSRAWDEVICEOFFSET,设置此参数:TRANLOGOPTIONS RAWDEVICEOFFSET 0其他平台不需要设置此参数。下面脚本为未使用ASM或Oracle 10.2.0.5、11.2.0.2 或之以后版本,如果为早期需要调取ASM实例ORACLE_SID,则需要特殊处理
完整示例详细可以参考OracleGoldenGate_Best_Practices-Oracle_GoldenGate_high_availability_using_Oracle_Clusterware_v8_6_ID1313703_1.pdf
#!/bin/sh# goldengate_action.scr# 生效oracle用户下环境变量,oracle下环境变量一定要配置相关变量,防止下面启动ogg无法读取相关ORACLE_SID导致启动extract失败. ~oracle/.bash_profile# 判断调用脚本是否有选项,如果第一个选项为空,则报错,提示使用选项[ -z "$1" ]&& echo "ERROR!! Usage $0 <start|stop|abort|clean>"&& exit 99# 指定goldengate安装目录GGS_HOME=<set the path here>#specify delay after start before checking for successful startstart_delay_secs=5#Include the Oracle GoldenGate home in the library path to start GGSCI,AIX variable is LIBPATHexport LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${GGS_HOME}#set the oracle home to the database to ensure Oracle GoldenGate will get#the right environment settings to be able to connect to the databaseexport ORACLE_HOME=<set the ORACLE_HOME path here>export CRS_HOME=<set the CRS_HOME path here>#Set NLS_LANG otherwise it will default to US7ASCIIexport NLS_LANG=American_America.US7ASCIIlogfile=/tmp/crs_gg_start.log\rm ${logfile}# define function log.function log (){DATETIME=`date +%d/%m/%y-%H:%M:%S`echo $DATETIME "goldengate_action.scr>>" $1echo $DATETIME "goldengate_action.scr>>" $1 >> $logfile}# define function check_process to check goldengate MGR process is runing or not.#check_process validates that a manager process is running at the PID#that Oracle GoldenGate specifies.check_process (){if ( [ -f "${GGS_HOME}/dirpcs/MGR.pcm" ] )then pid=`cut -f8 "${GGS_HOME}/dirpcs/MGR.pcm"` if [ ${pid} = `ps -e |grep ${pid} |grep mgr |awk ‘{ print $1 }‘` ] then #manager process is running on the PID . exit success echo "manager process is running on the PID . exit success">> /tmp/check.out exit 0 else #manager process is not running on the PID echo "manager process is not running on the PID" >> /tmp/check.out exit 1 fielse #manager is not running because there is no PID file echo "manager is not running because there is no PID file" >> /tmp/check.outexit 1fi}# call_ggsci is a generic routine that executes a ggsci commandcall_ggsci () {log "entering call_ggsci"ggsci_command=$1#log "about to execute $ggsci_command"log "id= $USER"cd ${GGS_HOME}ggsci_output=`${GGS_HOME}/ggsci << EOF${ggsci_command}exitEOF`log "got output of : $ggsci_output"}case $1 in‘start‘)#Updated by Sourav B (02/10/2011)# During failover if the “mgr.pcm” file is not deleted at the node crash# then Oracle clusterware won’t start the manager on the new node assuming the# manager process is still running on the failed node. To get around this issue# we will delete the “mgr.prm” file before starting up the manager on the new# node. We will also delete the other process files with pc* extension and to# avoid any file locking issue we will first backup the checkpoint files and then# delete them from the dirchk directory.After that we will restore the checkpoint# files from backup to the original location (dirchk directory).log "removing *.pc* files from dirpcs directory..."rm -f $GGS_HOME/dirpcs/*.pc*log "creating tmp directory to backup checkpoint file...."mkdir $GGS_HOME/dirchk/tmplog "backing up checkpoint files..."cp $GGS_HOME/dirchk/*.cp* $GGS_HOME/dirchk/tmplog "Deleting checkpoint files under dirchk......"rm -f $GGS_HOME/dirchk/*.cp*log "Restore checkpoint files from backup to dirchk directory...."cp $GGS_HOME/dirchk/tmp/*.cp* $GGS_HOME/dirchklog "Deleting tmp directory...."rm -r $GGS_HOME/dirchk/tmplog "starting manager"call_ggsci ‘start manager‘#there is a small delay between issuing the start manager command#and the process being spawned on the OS . wait before checkinglog "sleeping for start_delay_secs"sleep ${start_delay_secs}#check whether manager is running and exit accordinglycheck_process;;‘stop‘)#attempt a clean stop for all non-manager processescall_ggsci ‘stop er *‘#ensure everything is stoppedcall_ggsci ‘stop er *!‘#stop manager without (y/n) confirmationcall_ggsci ‘stop manager!‘#exit successexit 0;;‘check‘)check_processexit 0;;‘clean‘)#attempt a clean stop for all non-manager processescall_ggsci ‘stop er *‘#ensure everything is stoppedcall_ggsci ‘stop er *!‘#in case there are lingering processescall_ggsci ‘kill er *‘#stop manager without (y/n) confirmationcall_ggsci ‘stop manager!‘#exit successexit 0;;‘abort‘)#ensure everything is stoppedcall_ggsci ‘stop er *!‘#in case there are lingering processescall_ggsci ‘kill er *‘#stop manager without (y/n) confirmationcall_ggsci ‘stop manager!‘#exit successexit 0;;esac# login as oracle:$ /u01/app/11.2/grid/bin/crsctl add resource oggapp -type cluster_resource -attr "ACTION_SCRIPT=‘/acfs_mount_point/ogg.sh‘,CHECK_INTERVAL=30,START_DEPENDENCIES=‘hard(czhvip) pullup(czhvip)‘,STOP_DEPENDENCIES=‘hard(mvggatevip)‘"--脚本位置可以存放于本地oracle用户有读取执行权限的目录,如果存放于本地,则Grid各个节点都需要备份该文件--如果ogg安装使用acfs,则START_DEPENDENCIES可以配置与ASM强依赖。上述步骤即已完成第三方应用使用Grid托管,还是非常方便实用的。
1. 无法启动$ crsctl start res czhappCRS-2672: Attempting to start ‘czhapp‘ on ‘db-oracle-node1‘CRS-2674: Start of ‘czhapp‘ on ‘db-oracle-node1‘ failedCRS-2679: Attempting to clean ‘czhapp‘ on ‘db-oracle-node1‘CRS-2678: ‘czhapp‘ on ‘db-oracle-node1‘ has experienced an unrecoverable failureCRS-0267: Human intervention required to resume its availability.CRS-4000: Command Start failed, or completed with errors.# 如果配置资源属于Oracle,则日志目录为:$GRID_HOME/log/{node_name}/agent/crsd/scriptagent_oracle--关键内容如下2021-04-26 11:35:07.342: [czhapp][156428032]{1:39006:13462} [clean] Executing action script: /software/crs.sh[clean]2021-04-26 11:35:07.397: [ AGFW][156428032]{1:39006:13462} Command: clean for resource: czhapp 1 1 completed with invalid status: 2092021-04-26 11:35:07.397: [czhapp][156428032]{1:39006:13462} [check] Executing action script: /software/crs.sh[check]2021-04-26 11:35:07.397: [ AGFW][158529280]{1:39006:13462} Agent sending reply for: RESOURCE_CLEAN[czhapp 1 1] ID 4100:7175902021-04-26 11:35:07.454: [ AGFW][156428032]{1:39006:13462} Received unknown resource status code: 2092021-04-26 11:35:07.455: [ AGFW][158529280]{1:39006:13462} czhapp 1 1 state changed from: CLEANING to: UNKNOWN2. 分析可以从日志输出看到,识别到了脚本,但是通过在脚本中指定位置配置输出,发现脚本并未真正执行。最终排查原因主要为脚本开头未声明脚本类型导致该问题。3. 解决#!/bin/sh--写脚本还是要规范,以前写脚本偶尔拉下声明部分,并不影响,这次Oracle Grid代理脚本没有声明部分无法启动还是挺意外的,也说明还是要规范。1. 现象OCI相关报错,无法连接数据库2.分析AIX:ps -ef|grep goldengateps eauwww <pid>查看进程环境变量发现,变量中无ORACLE_SID。由于goldengate extract中配置,未配置使用tnsalias方式连接数据库,所以依赖于启动extract时用户操作系统环境变量ORACLE_SID,但是由于appvipcfg配置的vip资源未给oracle足够权限,导致使用oracle用户无法启动vip资源,进而导致使用root启动vip资源之后,环境变量无法取到ORACLE_SID,导致未能启动extract。3.解决--login as root# crsctl setperm resource oggvip -u user:oracle:rwx--login as oracle 测试$ crsctl start resource oggvip--如果上述命令依然无法使oracle启动资源,则继续修改oggvip权限--login as root--将other组权限设置为rwx即可解决# crsctl getperm resource oggvip# crsctl setperm resource oggvip -u other::rwx1. 现象# ./appvipcfg create -network=1 \-ip=192.168.204.245 \-vipname=czhvip \-user=root/bin/ls: cannot access /ade/ade_88979932/perl/lib: No such file or directory2. 原因由于opatch打补丁导致appvipcfg内容发生改变,appvipcfg本身为$GRID_HOME/bin/下的一个脚本文件,不是一个二进制文件,脚本中定义了ORACLE_HOME与ORA_CRS_HOME,由于打补丁导致该文件两个变量不正确,修改为正确路径即可解决。$ cat /u01/app/11.2/grid/bin/appvipcfg#!/bin/sh## This script is used for managing # user mode vip resource.## Do not change the line below for ORACLE_HOME setting#ORACLE_HOME=/u01/app/11.2/gridORACLE_HOME=/ade/ade_19289128/11.2/gridexport ORACLE_HOME#ORA_CRS_HOME=/u01/app/11.2/gridORA_CRS_HOME=/ade/ade_19289128/11.2/gridexport ORA_CRS_HOME使用Oracle Grid配置Goldengate或其他第三方应用高可用
原文:https://www.cnblogs.com/chenguopa/p/15228660.html