昨天去某客户那里做巡检,顺便看一下上次搭建的RAC-DG环境是否正常,不看不知道,一看吓一跳,上次的DG是8月20日运行的,而DG备库从8月31日之后实例就没有开启过,后来询问后才得知,原来那天断过一次电,后来重启了机器。直到今天我过去了,才把实例启动起来。也就是说,从8月31日到今天快1个月的时间,备库一致处于未使用状态。
接着查看备库归档,显然已经缺失了很多了,tnread1 最后一个日志为1661,tnread2 最后一个日志为1324,而此时主库中还保留的最早的日志是9月8日的,thread
1 最早为2055,thread 2 最早为1555。主备之间归档足足差了有好几百个(正常,都快一个月没开实例了)
ASMCMD> ls
2014_09_08/
2014_09_09/
2014_09_10/
2014_09_11/
2014_09_12/
2014_09_13/
2014_09_14/
2014_09_15/
2014_09_16/
2014_09_17/
2014_09_18/
2014_09_19/
2014_09_20/
2014_09_21/
2014_09_22/
2014_09_23/
2014_09_24/
2014_09_25/
ASMCMD> cd 2014_09_08
ASMCMD> ls
thread_1_seq_2055.500.857723297
thread_1_seq_2056.494.857725223
thread_1_seq_2057.493.857728031
thread_1_seq_2058.490.857729849
(略)……
thread_2_seq_1555.502.857723297
thread_2_seq_1556.499.857723301
thread_2_seq_1557.497.857723305
thread_2_seq_1558.496.857725225
(略)……
ASMCMD>
尽管在脚本中配置了备份完归档后用delete input来删除归档,以减小归档占用的磁盘空间,可以在RMAN脚本的备份日志中看到,从8月31日起,陆续有报RMAN-08137,提示由于备库还未获得归档,导致无法删除:
归档日志文件名=+DATA/sis/archivelog/2014_08_31/thread_1_seq_1661.1208.857041081 RECID=3930 STAMP=857041081
RMAN-08137: 警告: 归档日志未删除, 因为备用或上游捕获进程需要它
归档日志文件名=+DATA/sis/archivelog/2014_08_31/thread_1_seq_1662.1212.857042297 线程=1 序列=1662
RMAN-08137: 警告: 归档日志未删除, 因为备用或上游捕获进程需要它
归档日志文件名=+DATA/sis/archivelog/2014_09_01/thread_2_seq_1325.1204.857122335 线程=2 序列=1325
RMAN-08137: 警告: 归档日志未删除, 因为备用或上游捕获进程需要它
但是,由于FRA磁盘空间是有限的,使用到一定的百分比(有参数可调整),Oracle会自动清空其中的内容,以释放空间,因此在FRA中的归档日志大约保留了18天,从9月8日到9月25日,而8月31日的归档肯定是没有的了,最近的备份集只有到9月15日的。
于是决定重新搭建一下DG,关闭备库实例,删除全部数据库文件(数据文件、控制文件、日志文件),只保留密码文件、参数文件、tnsnames.ora、listener.ora即可,重建很方便,用11g的duplicate重新同步一下就可以了,命令如下:
rman target / auxiliary sys/oracle@sisdg
RMAN> run{
allocate channel c1 device type disk;
allocate auxiliary channel c2 device type disk;
set newname for tempfile 1 to ‘D:\app\administrator\oradata\sis\temp.269.852648395‘;
duplicate target database for standby from active database dorecover;
release channel c1;
release channel c2;
}
执行完以上操作后,备库与主库的归档就同步了
主库:
SQL> archive log list
数据库日志模式 存档模式
自动存档 启用
存档终点 USE_DB_RECOVERY_FILE_DEST
最早的联机日志序列 2617
下一个存档日志序列 2619
当前日志序列 2619
SQL> select thread#,max(sequence#) from v$archived_log group by thread#;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 2619
2 2556
备库:
SQL> archive log list
数据库日志模式 存档模式
自动存档 启用
存档终点 D:\archivelog
最早的联机日志序列 0
下一个存档日志序列 0
当前日志序列 0
SQL> select thread#,max(sequence#) from v$archived_log group by thread#;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 2619
2 2556
主备库查看了一下v$archive_dest_status,两边都是的status列都是valid的,因此开启备库的redo apply,看到日志也开始已经应用了
SQL> select thread#,sequence#,applied from v$archived_log;
THREAD# SEQUENCE# APPLIED
---------- ---------- ------------------
1 2617 YES
1 2618 YES
2 2556 YES
1 2619 NO
1 2620 NO
由于采用的是LGWR ASYNC模式传递日志,再重新创建一次standby redo logfile,主库每个thread有3组日志,所以备库创建了7组日志
此外,备库的alertlog里还报了个错误,因为是从主库duplicate过来的,RMAN的配置信息还保留着主库的一些参数:
Starting control autobackup
Got error: 19624
******************** WARNING ***************************
The errors during Server autobackup are not fatal, as it
is attempted after sucessful completion of the command.
However, it is recomended to take an RMAN control file
backup as soon as possible because the Autobackup failed
with the following error:
ORA-19624: operation failed, retry possible
ORA-19504: failed to create file "C:\ORABACKUP\BACKUPSETS\SIS1-C-3160648191-20140925-02.CTL"
ORA-27040: file create error, unable to create file
OSD-04002: 无法打开文件
O/S-Error: (OS 3) 系统找不到指定的路径。
******************** END OF WARNING *******************
RMAN> show all;
使用目标数据库控制文件替代恢复目录
db_unique_name 为 SISDG 的数据库的 RMAN 配置参数为:
CONFIGURE RETENTION POLICY TO REDUNDANCY 7;
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO ‘c:\orabackup\backupsets\sis1-%F.ctl‘;
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM ‘AES128‘; # default
CONFIGURE COMPRESSION ALGORITHM ‘BASIC‘ AS OF RELEASE ‘DEFAULT‘ OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘D:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DATABASE\SNCFSIS.ORA‘; # default
备库并没有“ ‘c:\orabackup\backupsets\ ”这个路径,所以报错,clear掉就可以了
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK CLEAR;
问题到这里似乎是圆满地解决了,但事实并非如此,当我在节点1连续切换了几次日志后发现,虽然归档都能顺利传到备库,但后续的日志始终应用不了,查看节点1和备库的alertlog,也没有发现有什么异常,直到我意识到应该看看节点2的情况时,才发现了点端倪。
照理说,正常情况下,在v$archive_dest_status视图中查询到的远程归档路径的status必须是要valid的,并且error列的状态为空,而节点2的查询结果是这样的:
SQL> select dest_id,status,error from v$archive_dest where dest_id<3;
DEST_ID STATUS ERROR
---------- --------- ------------------------------
1 VALID
2 ERROR ORA-21561: 生成 OID 失败
此时再去查看节点2的alertlog,发现除了报ORA-21561,伴随着的还有ORA-15055、ORA-25253等
Thu Sep 25 15:05:06 2014
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
WARNING: ASM communication error: op 0 state 0x0 (15055)
ERROR: direct connection failure with ASM
Thu Sep 25 15:05:37 2014
Error 21561 received logging on to the standby
Suppressing further error logging of LOG_ARCHIVE_DEST_2.
Thu Sep 25 15:05:46 2014
ORA-25253 encountered when generating server alert SMG-3503
Thu Sep 25 15:05:58 2014
Thread 2 advanced to log sequence 2561 (LGWR switch)
Current log# 6 seq# 2561 mem# 0: +DATA/tc/onlinelog/group_6.1523.854793175
Current log# 6 seq# 2561 mem# 1: +DATA/tc/onlinelog/group_6.1524.854793175
Thu Sep 25 15:05:58 2014
Archived Log entry 6148 added for thread 2 sequence 2560 ID 0xbc63a5fc dest 1:
Thu Sep 25 15:05:58 2014
Error 21561 received logging on to the standby
Suppressing further error logging of LOG_ARCHIVE_DEST_2.
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
Thu Sep 25 15:07:49 2014
ORA-25253 encountered when generating server alert SMG-3503
Thu Sep 25 15:08:07 2014
ARCH: Possible network disconnect with primary database
Thu Sep 25 15:11:38 2014
Error 21561 received logging on to the standby
在节点2执行lsnrctl status查看监听状态以及用tnsping NET SERVICE NAME,都会hang在那边,不显示结果
于是到MOS上面查阅了一下关于ORA-21561和ORA-15055的文档,有好几篇相关文档都有提到这一现象,主要描述如下:
1. 通常在10.2.0.x升级后发生,登录SQLPLUS时直接提示:ORA - 21561 : OID GENERATION FAILED,需要在.../hosts中加入localhost的完整主机名,我可以正常登录SQLPLUS,似乎这个不太符合我碰到的情形。
2. 由bug引起,Bug 14324057 and Bug 12529945有潜在消耗desktop heap memory的可能,需要修改注册表,来增加缺省desktop heap size来解决。
文档中还提到,bug针对的是11.2.0.2 patch 19——11.2.0.3 patch 7,但这个库已经打到patch 27了,如果是bug也应该已经修复了。于是尝试修改了注册表,修改\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session
Manager\SubSystems\下面的
SharedSection=1024,20480,1024,修改前先备份一下注册表
修改前:
%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,20480,768 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2
ServerDll=sxssrv,4 ProfileControl=Off MaxRequestThreads=16
修改后:
%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,20480,1024 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2
ServerDll=sxssrv,4 ProfileControl=Off MaxRequestThreads=16
注意,改完注册表后需要重启该节点主机后才能生效。
ORA-21561、ORA-15055、ORA-25253 导致DG备库无法应用归档
原文:http://blog.csdn.net/aaron8219/article/details/39567185