比如你可以把上面说到的sorted
set的score值设置成过期时间的时间戳,那么就可以简单地通过过期时间排序,定时清除过期数据了,不仅是清除Redis中的过期数据,你完全可以把Redis里这个过期时间当成是对数据库中数据的索引,用Redis来找出哪些数据需要过期删除,然后再精准地从数据库中删除相应的记录。
Redis is different than other database solutions in
many ways: it uses memory as main storage support and disk only for persistence,
the data model is pretty unique, it is single threaded and so forth. I think
that another big difference is that in order to take advantage of Redis in your
production environment you don‘t need to
switch to Redis. You
can just use it in order to do new things that were not possible before, or in
order to fix old problems.
Switching to Redis is of course an
option, and many users are using Redis as primary database since they need
features or write speed or latency or some other feature, but as you can guess
switching is a big step if you have an already running application in
production. Also for some other kind of applications Redis may not be the right
database: for instance a Redis data set can‘t be bigger than available memory,
so if you have some
big dataapplication and a mostly-reads access
pattern, Redis is not the right pick.
However one thing I like
about Redis is that it can solve a lot of problems just
adding it
to your stack to do things that were too slow or impossible with
your existing database. This way you start to take confidence with Redis in an
incremental way, starting to use it just to optimize or to create new features
in your application. This blog post explores a few use cases showing how people
added Redis to existing environments to take advantage of Redis set of features.
I‘ll not report specific use cases with site names and exact configurations,
I‘ll just try to show you class of problems that Redis can solve without being
your primary database.
Slow latest items listings in your home page
Can I have a penny for
every instance of the following query that is running too slow please?
SELECT * FROM foo WHERE ... ORDER BY time DESC LIMIT 10
To have listings like "latest items added by our users" or "latest something
else" in web applications is very common, and often a scalability problem. It is
pretty counter intuitive that you need to sort stuff if you just want to list
items in the same order they were created.
Similar problems can be
fixed using a Redis pattern that I‘ll show you with an example. We have a web
application where we want to show the latest 20 comments posted by our users.
Near to the latest comments box we also have a link "show all" that links to a
page where it is possible to show more than the latest 20 comments, and there is
also pagination so that I can see the whole comments "time
line".
We also assume that every comment is stored in our database,
and has an unique incremental ID field.
We can make both the home
page box and the comments time line page with pagination fast using a simple
Redis pattern:
- Every time a new comment is added, we add its ID into a Redis
list:LPUSH latest.comments <ID>.
- We also trim the list to a given length, so that Redis will just hold
the latest 5000 items: LTRIM latest.comments 0
5000.
- Every time we need to get a range of items for our latest comments
usages, we call a function that will do the following (in pseudo
code):
FUNCTION get_latest_comments(start,num_items):
id_list = redis.lrange("latest.comments",start,start+num_items-1)
IF id_list.length < num_items
id_list = SQL_DB("SELECT ... ORDER BY time LIMIT ...")
END
RETURN id_list
END
What we are doing here is simple. In Redis we are taking a
live
cache, always updated, of the latest IDs. But we are limited to 5000 IDs,
and after the system is started the first time those IDs can be even zero as the
list did not existed. So our new function to get the IDs of the latest comments
will try to always ask Redis. If our start/count parameters are out of range, we
fall back to the database.
We never need to "refresh" the cache
with this system, and the SQL database (or other type of on-disk data store)
will only be pinged if the user is paginating "far" intervals. So never for the
home page, and never for the first pages of our comments time
line.
As you can see here Redis is working as a new element. It is
not working as a traditional cache, there are no cache refreshes and the info in
the Redis instance is always coherent. It is not either working as a database as
you can flush the key and everything will continue working. I call it just a
"live cache" but there are better names I bet.
Deletion and
filtering Note that it is possible to handle comments
deletion using LREM. If deletions are pretty rare another option is to just skip
the entry when rendering the specific comment, since our DB query to fetch the
comment by ID will report us that the comment is no longe
there.
Also many times you want to have different listings with
different filters. When this filters are limited in number (for example
categories) you can simply use a different Redis list for every different filter
you have. After all you are just taking 5000 items per list, and Redis can hold
millions of items with little memory. As usually is a compromise, use your
creativity!
Leaderboards and related problems
Another very common need that is hard
to model with good performances in DBs that are not in-memory is to take a list
of items, sorted by a score, updated in real time, with many updates arriving
every second.
The classical example is the leaderboard in an online
game, for instance a Facebook game, but this pattern can be applied to a number
of different scenarios. In the online game example you receive a very high
number of score updates by different users. WIth this scores you usually want
to:
- Show a leaderboard with the top #100 scores.
- Show the user its current global rank.
This operations are trivial
using a Redis sorted set, even if you have millions of users and millions of new
scores per minute.
This is how mount this pattern: every time a new
score is received by an user, we do:
ZADD leaderboard <score> <username>
Note: you may want to use the user ID instead of the username, it is up to
your design To get the top 100 users by score is as easy
as
ZREVRANGE leaderboard 0 99.
Similarly to
tell the user its global rank you just do
ZRANK leaderboard
<username>.
Note that you can do more than this, for
instance it is trivial to show the user the scores of users "near" his position,
that is, to show the portion of the leaderboard that includes the score of our
user.
Order by user votes and time
A notable variation of the above
leaderboard pattern is the implementation of a site like Reddit or Hacker News,
where news are ordered accordingly to a forumla similar to:
score = points / time^alpha
So user votes will raise the
news in a proportional way, but time will take the news down exponentially. Well
the actual algorithm is up to you, this will not change our
pattern.
This pattern works in this way, starting from the
observation that probably only the latest, for instance, 1000 news are good
candidates to stay in the home page, so we can ignore all the others. The
implementation is simple:
- Every time a new news is posted we add the ID into a list, with LPUSH +
LTRIM in order to take only the latest 1000 items.
- There is a worker that gets this list and continually computes the final
score of every news in this set of 1000 news. The result is used to populate a
sorted set with ZADD. Old news are removed from the sorted set in the mean
time as a cleanup operation.
At this point we have a sorted set
composed of 1000 news sorted by our score. This sorted set can be queried 100k
times per second for the top news, so it will be easy to scale the site this
way.
The key idea here is that our sorting, made by the background
worker, is not a work proportional to the number of users watching the news
site.
For the "just posted" section the list of IDs can be used
raw, or using the first pattern proposed in this blog post.
Implement expires on items
Another way to use sorted sets is to index
stuff by time. We just use the unix time as score. This can be used in general
to index things by time, but a notable usage is to expire things in our main
database when a given amount of time has elapsed.
This is the
pattern:
- Every time a new item is added to our (non Redis) database we add it into
the sorted set. As score we use the time at which this item should expire, in
other words the current_time+time_to_live.
- There is a background worker doing queries in the sorted set using for
instance ZRANGE ... WITHSCORES to take the latest 10 items. If there are
scores representing unix times already in the past, we delete this items from
the database.
Counting stuff
Redis is a good counter, thanks to INCRBY and other
similar commands.
How many times you wanted to add new counters in
your database, to take statistics or to show new informations to your users, but
avoided it since it is a write-intensive task for your database? This happened
to me many times in the past.
Well, just use Redis and don‘t care!
With atomic increments you can take all your counts, reset them atomically with
GETSET if needed, put expires in your counters, so that you can take the count
of events only if the time difference between those events is less then a given
amount of seconds.
For instance using just:
INCR user:<id>
EXPIRE user:<id> 60
You can
take the count of how many page views the user did recently, without a pause
greater than 60 seconds between page views. When this count reaches, for
instance, 20, it is time to show some banner, or reminder, or tip, or what you
want.
Unique N items in a given amount of time
Another interesting example of
statistic that is trivial to do using Redis but is very hard with other kind of
databases is to see how many unique users visited a given resource in a given
amount of time. For instance I want to know the number of unique registered
users, or IP addresses, that accessed a given article in an online
newspaper.
Every time I get a new pageview I just do the following:
SADD page:day1:<page_id> <user_id>
Of course
instead of day1 you may want to use the first second of today, as unix time,
like: time()-(time()%3600*24), or something like that.
Want to know
the number of unique users? Just do
SCARD
page:day1:<page_id>.
Need to test if a specific user
already accessed that page? Just do
SISMEMBER
page:day1:<page_id>.
Real time analysis of what is happening, for stats, anti spam, or
whatever
We did just a few examples, but if you study the Redis command set
and combine the data structures in an interesting way you can model an huge
number of real time stats with little efforts, in order to power your anti spam
systems, or the quality of service you can provide to user thanks to the new
information.
Pub/Sub
Do you know that Redis includes a fairly high performance
implementation of Pub/Sub?
Redis Pub/Sub is very very simple to
use, stable, and fast, with support for pattern matching, ability to
subscribe/unsubscribe to channels on the run, and so forth. You can read more
about it in the
Redis PubSub
official documentation.
Queues
You probably already noticed how Redis commands like list push
and list pop make it suitable to implement queues, but you can do more than
that: Redis has
blocking variants
of list pop commands that will block if a list is empty.
A
common usage of Redis as a queue is the
Resque library,
implemented and popularized by Github‘s folks.
With our
http://redis.io/commands/rpoplpush list
rotation commands it is possible to implement queues with interesting semantics
that will make your background workers happier! (For instance you can implement
a rotating list to fetch RSS feeds again and again, so that every worker will
pick the RSS that was fetched more in the past, and thus needs to be updated
ASAP). Similarly using sorted sets it is possible to implement priority queues
easily.
Caching
This section alone would deserve a specific blog post... so in
short here I‘ll say that Redis can be used as a replacement for memcached in
order to turn your cache into something able to store data in an simpler to
update way, so that there is no need to regenerate the data every time. See for
reference the first pattern published in this article.
Redis can fix your problems now!
You can use Redis
right
now to do things that will make your users happier, your systems
less complex, your site more responsive. You don‘t need to replace your current
setup in order to use it, just start using Redis to do new things that were
otherwise not possible, or hard, or too costly.
Have
fun!
You can discuss this entry here or
into Hacker
News.