首页 > 编程语言 > 详细

一致性算法探寻(扩展版)11

时间:2015-08-11 19:32:40      阅读:269      评论:0      收藏:0      [点我收藏+]

9 Implementation and evaluation

We have implemented Raft as part of a replicated state machine that stores configuration information for RAMCloud [33] and assists in failover of the RAMCloud coordinator. The Raft implementation contains roughly 2000 lines of C++ code, not including tests, comments, or blank lines. The source code is freely available [23]. There are also about 25 independent third-party open source implementations [34] of Raft in various stages of development, based on drafts of this paper. Also, various companies are deploying Raft-based systems [34]. 

The remainder of this section evaluates Raft using three criteria: understandability, correctness, and performance.

9.1 Understandability

To measure Raft’s understandability relative to Paxos, we conducted an experimental study using upper-level undergraduate and graduate students in an Advanced Operating Systems course at Stanford University and a Distributed Computing course at U.C. Berkeley. We recorded a video lecture of Raft and another of Paxos, and created corresponding quizzes. The Raft lecture covered the content of this paper except for log compaction; the Paxos lecture covered enough material to create an equivalent replicated state machine, including single-decree Paxos, multi-decree Paxos, reconfiguration, and a few optimizations needed in practice (such as leader election). The quizzes tested basic understanding of the algorithms and also required students to reason about corner cases. Each student watched one video, took the corresponding quiz, watched the second video, and took the second quiz. About half of the participants did the Paxos portion first and the other half did the Raft portion first in order to account for both individual differences in performance and experience gained from the first portion of the study. We compared participants’ scores on each quiz to determine whether participants showed a better understanding of Raft.

We tried to make the comparison between Paxos and Raft as fair as possible. The experiment favored Paxos in two ways: 15 of the 43 participants reported having some prior experience with Paxos, and the Paxos video is 14% longer than the Raft video. As summarized in Table 1, we have taken steps to mitigate potential sources of bias. All of our materials are available for review [28, 31].

On average, participants scored 4.9 points higher on the Raft quiz than on the Paxos quiz (out of a possible 60 points, the mean Raft score was 25.7 and the mean Paxos score was 20.8); Figure 14 shows their individual scores. A paired t-test states that, with 95% confidence, the true distribution of Raft scores has a mean at least 2.5 points larger than the true distribution of Paxos scores.

We also created a linear regression model that predicts a new student’s quiz scores based on three factors: which quiz they took, their degree of prior Paxos experience, and the order in which they learned the algorithms. The model predicts that the choice of quiz produces a 12.5-point difference in favor of Raft. This is significantly higher than the observed difference of 4.9 points, because many of the actual students had prior Paxos experience, which helped Paxos considerably, whereas it helped Raft slightly less. Curiously, the model also predicts scores 6.3 points lower on Raft for people that have already taken the Paxos quiz; although we don’t know why, this does appear to be statistically significant.

We also surveyed participants after their quizzes to see which algorithm they felt would be easier to implement or explain; these results are shown in Figure 15. An overwhelming majority of participants reported Raft would be easier to implement and explain (33 of 41 for each question). However, these self-reported feelings may be less reliable than participants’ quiz scores, and participants may have been biased by knowledge of our hypothesis that Raft is easier to understand.

A detailed discussion of the Raft user study is available at [31].

9.2 Correctness

We have developed a formal specification and a proof of safety for the consensus mechanism described in Section 5. The formal specification [31] makes the information summarized in Figure 2 completely precise using the TLA+ specification language [17]. It is about 400 lines long and serves as the subject of the proof. It is also useful on its own for anyone implementing Raft. We have mechanically proven the Log Completeness Property using the TLA proof system [7]. However, this proof relies on invariants that have not been mechanically checked (for example, we have not proven the type safety of the specification). Furthermore, we have written an informal proof [31] of the State Machine Safety property which is complete (it relies on the specification alone) and relatively precise (it is about 3500 words long).

9.3 Performance

Raft’s performance is similar to other consensus algorithms such as Paxos. The most important case for performance is when an established leader is replicating new log entries. Raft achieves this using the minimal number of messages (a single round-trip from the leader to half the cluster). It is also possible to further improve Raft’s performance. For example, it easily supports batching and pipelining requests for higher throughput and lower latency. Various optimizations have been proposed in the literature for other algorithms; many of these could be applied to Raft, but we leave this to future work.

We used our Raft implementation to measure the performance of Raft’s leader election algorithm and answer two questions. First, does the election process converge quickly? Second, what is the minimum downtime that can be achieved after leader crashes?

To measure leader election, we repeatedly crashed the leader of a cluster of five servers and timed how long it took to detect the crash and elect a new leader (see Figure 16). To generate a worst-case scenario, the servers in each trial had different log lengths, so some candidates were not eligible to become leader. Furthermore, to encourage split votes, our test script triggered a synchronized broadcast of heartbeat RPCs from the leader before terminating its process (this approximates the behavior of the leader replicating a new log entry prior to crashing). The leader was crashed uniformly randomly within its heartbeat interval, which was half of the minimum election timeout for all tests. Thus, the smallest possible downtime was about half of the minimum election timeout.

The top graph in Figure 16 shows that a small amount of randomization in the election timeout is enough to avoid split votes in elections. In the absence of randomness, leader election consistently took longer than 10 seconds in our tests due to many split votes. Adding just 5ms of randomness helps significantly, resulting in a median downtime of 287ms. Using more randomness improves worst-case behavior: with 50ms of randomness the worstcase completion time (over 1000 trials) was 513ms.

The bottom graph in Figure 16 shows that downtime can be reduced by reducing the election timeout. With an election timeout of 12–24ms, it takes only 35ms on average to elect a leader (the longest trial took 152ms). However, lowering the timeouts beyond this point violates Raft’s timing requirement: leaders have difficulty broadcasting heartbeats before other servers start new elections. This can cause unnecessary leader changes and lower overall system availability. We recommend using a conservative election timeout such as 150–300ms; such timeouts are unlikely to cause unnecessary leader changes and will still provide good availability.

9 实现和评估

我们已经实现了Raft作为存有RAMCloud配置和RAMCloud缓存失效协作者信息的复制状态机部分。Raft实现包含大约2000行C++代码,不包括测试、注释和空行。这些源码是免费的[23]。还有大概25个Raft不同发展阶段的第三方开源实现[34],都基于本论文。此外,很多公司部署了基于Raft的系统。

余下章节将使用三个标准来评估Raft:易懂性、正确性和性能。

9.1 易懂性

为了相对Paxos评测Raft的易懂性,我们对来自斯坦福大学攻读Advanced Operating Systems课程和加州大学伯克利分校攻读Distributed Computing课程的高年级在校生和毕业生进行了一项实验。我们分别录了关于Raft和Paxos的视频讲座,并建立了相应的测验。Raft讲座覆盖了除日志压缩外的本论文;Paxos讲座覆盖了相当于创建一个复制状态机的足够材料,包括single-decree Paxos,multi-decree Paxos,重新配置和一些在实际中需要的优化(比如leader选举)。测验测试了算法基础的理解同时要求学生推出极端情况。每个学生只看一个视频,做相应的测验,再看第二个视频和做相应的测验。为了体现本研究中两者在第一次获得的经验和表现的差异,差不多半数参与者第一次做Paxos部分,另一半第一次做Raft部分。我们比较了参与者每次测验的分数来确定参与者是否表现出了Raft更好理解。

我们试图让Paxos和Raft的对比尽可能的公平。实验中偏爱Paxos的有两种:43个参与者中的15个具有一些Paxos的现有经验,Paxos的视频比Raft的长14%。如表1所示,我们已经采取措施来减少参与来源的偏差。我们所有的材料都可以进行审查[28,31]。

平均而言,参与者在Raft测验中得4.9分,要比Paxos的高(换算成60分制,Raft得分25.7,Paxos得分20.8);图14显示了他们各自的得分。配对t-test表明,我们有95%的信心说Raft的得分分布要比Paxos高至少2.5分。

我们还建立了一个基于因素的线性回归模型来推测一个新的参与者的测验得分:他们采用了哪个测试,他们之前的Paxos经验,以及他们学习算法的顺序。模型预测测验的选择会让12.5个点的人更爱Raft。这明显比4.9分更说明问题,因为实际上很多学生之前有Paxos的经验,这相当的帮助了Paxos,而能帮助Raft就很少了。奇怪的是,该模型还预测了做过Paxos测试的人得分6.3分比只做Raft的要低;虽然我们不知道为什么,但是它确实在统计上出现了。

我们还调查了他们参加完测试后,他们认为接触到的算法哪个更容易实现和解释;图15显示了这些结果。绝大多数参与者表示Raft更容易实现和解释(41个问题的33个)。然而这种自我报告的感受比起参与者的测验得分不可靠,而且这些参与者可能在我们设定的知识点即Raft更易懂有偏差。

Raft用户研究的详细讨论在[31]。

9.2 正确性







一致性算法探寻(扩展版)11

原文:http://my.oschina.net/daidetian/blog/490779

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!