1. 近日处理一个由于standby 磁盘IO性能较差,导致Primary的性能受到影响。
主库主要是等待"log file switch completion",通过ASH dump分析,最终发现实际等待事件是"LGWR-LNS wait on channel”.这个事件基本上可以将问题归结到网络性能和standby的IO性能,而客户的传输模式是“MAXIMUM AVAILABILITY"
最后提出两个解决方案,
(1). 更换性能更好的standby存储
(2). 修改传输模式为MAXIMUM performance,并使用LGWR ASYNC传输模式
这里顺带强调一下standby三种传输模式,以及对应的可使用的传输方式
比较项 | Maximum protection | Maximum availability | Maximum performance |
Redo写或传输进程 | lgwr | lgwr | lgwr或者arch |
网络传输模式 | sync | sync | sync或者async |
IO写入成功确认 | affirm | affirm | affirm或者noaffirm |
standby redologs | 需要 | 需要 | lgwr需要,arch不需要 |
问题的根本,就是standby IO性能差,而使用“MAXIMUM AVAILABILITY"方式传输,使用sync模式,需要磁盘IO的写入成功的确认信息,导致拖累的primary的性能。
2. 下面是关于SYNC和ASYNC的介绍
http://docs.oracle.com/cd/B10501_01/server.920/a96653/log_arch_dest_param.htm#77394
SYNC=PARALLEL
SYNC=NOPARALLEL
The SYNC attribute specifies that network I/O is to be performed synchronously for the destination, which means that once the I/O is initiated, the archiving process waits for the I/O to complete before continuing. The SYNC attribute is one requirement for setting up a no-data-loss environment, because it ensures that the redo records were successfully transmitted to the standby site before continuing.
If the log writer process is defined to be the transmitter to multiple standby destinations that use the SYNC attribute, the user has the option of specifying SYNC=PARALLEL or SYNC=NOPARALLEL for each of those destinations.
- If SYNC=NOPARALLEL is used, the log writer process performs the network I/O to each destination in series. In other words, the log writer process initiates an I/O to the first destination and waits until it completes before initiating the I/O to the next destination. Specifying the SYNC=NOPARALLEL attribute is the same as specifying the ASYNC=0 attribute.
- If SYNC=PARALLEL is used, the network I/O is initiated asynchronously, so that I/O to multiple destinations can be initiated in parallel. However, once the I/O is initiated, the log writer process waits for each I/O operation to complete before continuing. This is, in effect, the same as performing multiple, synchronous I/O operations simultaneously. The use of SYNC=PARALLEL is likely to perform better than SYNC=NOPARALLEL.
Because the PARALLEL and NOPARALLEL qualifiers only make a difference if multiple destinations are involved, Oracle Corporation recommends that all destinations use the same value.
ASYNC[=blocks]
The ASYNC attribute specifies that network I/O is to be performed asynchronously for the destination. Once the I/O is initiated, the log writer continues processing the next request without waiting for the I/O to complete and without checking the completion status of the I/O. Use of the ASYNC attribute allows standby environments to be maintained with little or no performance effect on the primary database. The optional block count determines the size of the SGA network buffer to be used. In general, the slower the network connection, the larger the block count should be. Also, specifying the ASYNC=0 attribute is the same as specifying the SYNC=NOPARALLEL attribute.
通过仔细解读文档,可以总结下面几点
sync,在IO传输发起之后,只有在standby做IO确认成功信息反馈之后,primary才能继续进行下一步,这样,如果standby IO性能较差,就会影响主库性能。
Async,是不需要对IO进行确认了,在primary发起IO初始化之后,就进行下一步工作了,standby的写入快慢,不会影响到primary
3. 在充分理解这两个概念之后,再回头分析客户的问题:
客户一共有三个standby,但是LOG_ARCHIVE_DEST_3对应的standby服务器性能较差, 在系统相对繁忙的时间段, 在oswatcher log中可以发现,standby的IO使用率都是100%。
至此,问题已经确认,就是standby服务器和primary的性能差距比较大,同时由于使用LGWR SYNC传输模式,导致standby的IO压力比较大。
并且primary要在standby确认收到log信息的传输完成,才能继续下一步,导致primary的性能受到很大影响。
4. 总结,建议standby的性能不要与primary有太大差异,至少能达到primary的70~80%的性能,不然在switch和fail over的时候,standby根本无法接管primary的业务。
而且在日常的日志传输等,也会影响primary的性能。
也许看完本文之后,你会有个疑问?说好的Maximum availability可以自动切换成Maximum performance呢?怎么就会影响到性能呢?
5. 带着问题,我们来分析一下,先看概念:
Maximum availability Thisprotection mode provides the highest level of data protection that is possiblewithout compromising the availability of the primary database. Like maximumprotection mode, a transaction will not commit until the redo needed to recoverthat transaction is written to the local online redo log and to the standbyredo log of at least one transactionally consistent standby database. Unlikemaximum protection mode, the primary database does not shut down if a faultprevents it from writing its redo stream to a remote standby redo log. Instead,the primary database operates in maximum performance mode until the fault iscorrected, and all gaps in redo log files are resolved. When all gaps areresolved, the primary database automatically resumes operating in maximumavailability mode.
This mode ensures that no data loss will occur if the primarydatabase fails, but only if a second fault does not prevent a complete set ofredo data from being sent from the primary database to at least one standbydatabase.
最大可用性模式 -- 这种保护模式提供了可能的最高级别的数据保护,而不用与主数据库的可用性相折衷。与最大保护模式相同,在恢复事务所需的重做写到本地联机重做日志和至少一 个事务一致性备数据库上的备重做日志之前,事务将不会提交。与最大保护模式不同的 是,如果故障导致主数据库无法写重做流到异地备重做日志时,主数据库不会关闭。替代地,主数据库以最大性能模式运行直到故障消除,并且解决所有重做日志文 件中的中断。当所有中断解决之后,主数据库自动继续以最大可用性模式运行。
这种模式确保如果主数据库故障,但是只有当第二次故障没有阻止完整的重做数据集从主数据库发送到至少一个备数据库时,不发生数据丢失。
在Maximum availability模式下,如果和备库的连接正常,运行方式等同Maximum protection模式,事务也是主备库同时提交。如果备库和主库失去联系,则主库自动切换到Maximum performance模式下运行,保证主库具有最大的可用性。
发现没?“如果备库和主库失去联系”,“失去联系”非常重要。本文的情况,恰恰是正常联系,就是IO性能较差,不是完全不提供服务。
本文出自 “小小狗窝” 博客,请务必保留此出处http://hsbxxl.blog.51cto.com/181620/1846499
原文:http://hsbxxl.blog.51cto.com/181620/1846499