分区:如何把数据存储在多个实例中。
Partitioning is the process of splitting your data into multiple Redis instances, so that every instance will only contain a subset of your keys. The first part of this document will introduce you to the concept of partitioning, the second part will show you the alternatives for Redis partitioning.
分区是把你的数据分割存储在多个redis实例中的一个过程,每个实例中只保存一部分key。本文件的第一部分将介绍你到分区的概念,第二部分说明如何使用redis分区。
Partitioning in Redis serves two main goals:
在redis服务器中使用分区有两个主要作用:
分区的基本概念
There are different partitioning criteria. Imagine we have four Redis instances R0, R1, R2, R3, and many keys representing users like user:1
, user:2
, ... and so forth, we can find different ways to select in which instance we store a given key. In other words there are different systems to map a given key to a given Redis server.
有多种分区方式。比如:我们有四个redis实例:R0, R1, R2, R3和许多代表用户的键(像 user:1
, user:2
)等等,我可以用不同的方式来从中选择一个实例来存储一个键。换句话说,有不同的系统来映射给定的键存储到给定的redis服务器中。
One of the simplest way to perform partitioning is called range partitioning, and is accomplished by mapping ranges of objects into specific Redis instances. For example I could say, users from ID 0 to ID 10000 will go into instanceR0, while users form ID 10001 to ID 20000 will go into instance R1 and so forth.
一个最简单的分区方法就是范围分区,并通过具体的实例对象来映射该范围。比如,id 1到10000的用户存储到R0中,10001到20000的用户存储到R1中,依此类推。
This systems works and is actually used in practice, however it has the disadvantage that there is to take a table mapping ranges to instances. This table needs to be managed and we need a table for every kind of object we have. Usually with Redis it is not a good idea.
这个方案是可以被应用到实践中的,但是他有一个缺点就是他需要一个表来存储每个实例存储范围的映射关系。这个表是需要维护的,并且我们需要为我们每一种对象创建这么一张表。所以在使用redis时,这不是一个很好的方案。
An alternative to to range partitioning is hash partitioning. This scheme works with any key, no need for a key in the form object_name:<id>
as is as simple as this:
散列分区:一种可以替代范围分区的分区方式。该方案适用于任何键,他简单到不需要使用这样的键(object_name:<id>):
crc32
hash function. So if the key is foobar
I do crc32(foobar)
that will output something like 93024922.93024922 modulo 4
equals 2, so I know my key foobar
should be stored into the R2 instance. Note: the modulo operation is just the rest of the division, usually it is implemented by the%
operator in many programming languages.93024922模4等于
2,这样我就知道foobar这个key应该存放到R2实例中。提示:取模运算是他工程里的说法,通常我们在程序语言设计中只需要使用%(取余)就可以了。There are many other ways to perform partitioning, but with this two examples you should get the idea. One advanced form of hash partitioning is called consistent hashing and is implemented by a few Redis clients and proxies.
通过这两个例子,你应该能想到还有很多其他的划分方式。哈希分区是一种先进的分区形式,它也被叫做一致性分区,他由几个redis客户端和代理实现。
不同的划分方式的实现
Partitioning can be responsibility of different parts of a software stack.
分区可以由一个软件栈的不同职责区域完成。
Some features of Redis don‘t play very well with partitioning:
redis分区在有些方面做的并不好:
Partitioning when using Redis ad a data store or cache is conceptually the same, however there is a huge difference. While when Redis is used as a data store you need to be sure that a given key always maps to the same instance, when Redis is used as a cache if a given node is unavailable it is not a big problem if we start using a different node, altering the key-instance map as we wish to improve the availability of the system (that is, the ability of the system to reply to our queries).
使用redis存储数据或者缓存数据在概念上是相同的,但是使用过程中这两者有巨大的差距。当redis被当作持久化数据存储服务器使用的时候意味着对于相同的键值必须被映射到相同的实例上面,但是如果把redis当作数据缓存器,当我们使用不同的节点的时候,找不到对应键值的对象不是什么大问题(缓存就是随时准备好牺牲自己),改变键值和实例映射逻辑可以提供系统的可用性(也就是系统处理查询请求的能力)。
Consistent hashing implementations are often able to switch to other nodes if the preferred node for a given key is not available. Similarly if you add a new node, part of the new keys will start to be stored on the new node.
一致性哈希可以为给定的键值不可用的情况下能够切换到其他的节点上。同样的,你添加一个新的节点,部分新的键值开始存储到新添加的节点上面。
The main concept here is the following:
主要的概念如下:
预分片
We learned that a problem with partitioning is that, unless we are using Redis as a cache, to add and remove nodes can be tricky, and it is much simpler to use a fixed keys-instances map.
从分区的概念中,我们可以了解,除非只把redis当作缓存服务器来使用,否则添加和删除redis节点都会非常复杂。相反使用固定的键值和实例映射确实很简单的。
However the data storage needs may vary over the time. Today I can live with 10 Redis nodes (instances), but tomorrow I may need 50 nodes.
然而数据存储会经常需要变化。今天我只需要10个redis节点(实例),但是明天我可能会需要50个节点。
Since Redis is extremely small footprint and lightweight (a spare instance uses 1 MB of memory), a simple approach to this problem is to start with a lot of instances since the start. Even if you start with just one server, you can decide to live in a distributed world since your first day, and run multiple Redis instances in your single server, using partitioning.
因为redis足够轻量和小巧(一个备用实例使用1M的内存),解决这个问题的简单方法就是一开始就使用大量的实例节点。即使你开始是有一个服务器,你可以换成分布式的结构,因为可以在单个服务器上通过分区分方式来运行多个redis节点。
And you can select this number of instances to be quite big since the start. For example, 32 or 64 instances could do the trick for most users, and will provide enough room for growth.
你可以选择的实例可数可以非常大。例如,32或者64个实例能够满足绝大多数的用户,并且可以为其提供足够的增长空间。
In this way as your data storage needs increase and you need more Redis servers, what to do is to simply move instances from one server to another. Once you add the first additional server, you will need to move half of the Redis instances from the first server to the second, and so forth.
通过这样的方法来满足数据存储需求的增加时你只需要更多的redis服务器,然后把一个节点移动到另外的服务器上面。一旦你添加了额外的服务器,你可以将一半的redis的实例移动到第二个等等。
Using Redis replication you will likely be able to do the move with minimal or no downtime for your users:
你可以使用redis 的主从复制来减少服务的停止时间:
SLAVEOF NO ONE
command to the slaves in the new server.So far we covered Redis partitioning in theory, but what about practice? What system should you use?
到目前为止,我们讲了分区的原理。但是该如何实战?你应该使用什么样的系统?
Unfortunately Redis Cluster is currently not production ready, however you can get more information about it reading the specification or checking the partial implementation in the unstable
branch of the Redis GitHub repositoriy.
不幸的是redis集群的正式版还没有发布,但是你可以在github上得到不稳定版,看一看他的规范和实现方式。
Once Redis Cluster will be available, and if a Redis Cluster complaint client is available for your language, Redis Cluster will be the de facto standard for Redis partitioning.
一旦redis集群正式版发布,并且提供的客户端语言接口可用,那么这种方式将成为标准的redis分区方式。
Redis Cluster is a mix between query routing and client side partitioning.
redis集群是一个查询路由和客户端分区的混合体。
Twemproxy 框架
Twemproxy is a proxy developed at Twitter for the Memcached ASCII and the Redis protocol. It is single threaded, it is written in C, and is extremely fast. It is open source software released under the terms of the Apache 2.0 license.
Twemproxy是一个由Twitter开发的适合memached和redis协议的代理。它是单线程工作,使用C语言实现的,非常的快速。并且是Apache 2.0版权申明下的开源软件。
Twemproxy supports automatic partitioning among multiple Redis instances, with optional node ejection if a node is not available (this will change the keys-instances map, so you should use this feature only if you are using Redis as a cache).
Twemproxy支持自动在多个redis节点分区,如果某个节点不可用,将会被自动屏蔽(这将改变键值和节点映射表,所以如果你把redis当作缓存服务器使用你应该使用这个功能)。
It is not a single point of failure since you can start multiple proxies and instruct your clients to connect to the first that accepts the connection.
你可以启用多个代理,让你的客户端得到可用的连接,这样不会发生单点故障。
Basically Twemproxy is an intermediate layer between clients and Redis instances, that will reliably handle partitioning for us with minimal additional complexities. Currently it is the suggested way to handle partitioning with Redis.
Twemproxy基本上是redis和客户端的一个过渡层,通过简化使用让我们使用可靠的分区。目前这也是使用redis分区的推荐方案。
You can read more about Twemproxy in this antirez blog post.
你可以在antirez的博客发现有关Twemproxy的更多知识。
An alternative to Twemproxy is to use a client that implements client side partitioning via consistent hashing or other similar algorithms. There are multiple Redis clients with support for consistent hashing, notably Redis-rb and Predis.
替代Twemproxy的一种方案是使用客户端一致性哈西或者其他类似的算法。有需要redis客户端支持一致性哈西,比如Redis-rb和Predis。
Please check the full list of Redis clients to check if there is a mature client with consistent hashing implementation for your language.
请检查列表已确定是否有成熟的一直性哈希实现的,并且适合于你的编程语言的客户端。
原文:http://www.cnblogs.com/eric-z/p/3995502.html