http://blog.csdn.net/liangyuannao/article/details/7776057
1. 支持一个进程打开大数目的socket描述符(FD)
select最不能忍受的是一个进程所打开的FD是有一定限制的,由FD_SETSIZE设置,默认值是2048。对于那些需要支持的上万连接数目的IM服务器来说显然太少了。这时候你一是可以选择修改这个宏然后重新编译内核,不过资料也同时指出这样会带来网络效率的下降;二是可以选择多进程的解决方案(传统的Apache方案),不过虽然Linux上面创建进程的代价比较小,但仍旧是不可忽视的,加上进程间数据同步远比不上线程间同步的高效,所以也不是一种完美的方案。不过epoll则没有这个限制,它锁支持的FD上限是最大可以打开文件的数目,这个数字一般远大于2048,举个例子,在1GB内存的机器上大约是10万左右,具体数目可以cat /proc/sys/fs/file-max察看,一般来说这个数目和系统内存关系很大。
2. IO效率不随FD数目增加而线性下降
传统的select/poll另外一个致命弱点就是当你拥有一个很大的socket集合,不过由于网络延时,任一时间只有部分socket是“活跃”的,但是select/poll每次调用都会线性扫描全部的集合,导致效率呈线性下降。但是epoll不存在这个问题,它只会对“活跃”的socket进行操作——这是因为在内核实现中epoll是根据每个fd上面的callback函数实现的。那么,只有“活跃”的socket才会主动的去调用callback函数,其他idle状态socket则不会,在这点上,epoll实现了一个"伪"AIO,因为这时候推动力在os内核。在一些benchmark中,如果所有的socket基本上都是活跃的——比如一个高速LAN环境,epoll并不比select/poll有什么效率,相反,如果过多使用epoll_ctl,效率相比还有稍微的下降。但是一旦使用idle connections模拟WAN环境,epoll的效率就远在select/poll之上了。
3. 使用mmap加速内核与用户空间的消息传递
这点实际上涉及到epoll的具体实现了。无论是select,poll还是epoll都需要内核把FD消息通知给用户空间,如何避免不必要的内存拷贝就很重要,在这点上,epoll是通过内核于用户空间mmap同一块内存实现的。而如果你像我一样从2.5内核就关注epoll的话,一定不会忘记手工mmap这一步的。
4. 内核微调
这一点其实不算epoll的优点了,而是整个Linux平台的优点。也许你可以怀疑Linux平台,但是你无法回避Linux平台赋予你微调内核的能力。比如,内核TCP/IP协议栈使用内存池管理sk_buff结构,那么可以在运行时期动态调整这个内存pool(skb_head_pool)的大小——通过echo XXXX> /proc/sys/net/core/hot_list_length完成。再比如listen函数的第2个参数(TCP完成3次握手的数据包队列长度),也可以根据你平台内存大小动态调整。更甚至在一个数据包面数目巨大但同时每个数据包本身大小却很小的特殊系统上尝试最新的NAPI网卡驱动架构。
Using epoll() For Asynchronous Network Programming
http://kovyrin.net/2006/04/13/epoll-asynchronous-network-programming/
General way to implement tcp servers is “one thread/process per connection”. But on high loads this approach can be not so efficient and we need to use another patterns of connection handling. In this article I will describe how to implement tcp-server with synchronous connections handling using epoll() system call of Linux 2.6. kernel.
epoll is a new system call introduced in Linux 2.6. It is designed to replace the deprecated select (and also poll). Unlike these earlier system calls, which are O(n), epoll is an O(1) algorithm – this means that it scales well as the number of watched file descriptors increase. select uses a linear search through the list of watched file descriptors, which causes its O(n) behaviour, whereas epoll uses callbacks in the kernel file structure.
Another fundamental difference of epoll is that it can be used in an edge-triggered, as opposed to level-triggered, fashion. This means that you receive “hints” when the kernel believes the file descriptor has become ready for I/O, as opposed to being told “I/O can be carried out on this file descriptor”. This has a couple of minor advantages: kernel space doesn’t need to keep track of the state of the file descriptor, although it might just push that problem into user space, and user space programs can be more flexible (e.g. the readiness change notification can just be ignored).
To use epoll method you need to make following steps in your application:
epfd = epoll_create(EPOLL_QUEUE_LEN);
where EPOLL_QUEUE_LEN is the maximum number of connection descriptors you expect to manage at one time. The return value is a file descriptor that will be used in epoll calls later. This descriptor can be closed with close() when you do not longer need it.
static struct epoll_event ev; int client_sock; ... ev.events = EPOLLIN | EPOLLPRI | EPOLLERR | EPOLLHUP; ev.data.fd = client_sock; int res = epoll_ctl(epfd, EPOLL_CTL_ADD, client_sock, &ev);
where ev is epoll event configuration sctucture, EPOLL_CTL_ADD – predefined command constant to add sockets to epoll. Detailed description of epoll_ctl flags can be found in epoll_ctl(2)man page. When client_sock descriptor will be closed, it will be automatically deleted from epoll descriptor.
while (1) { // wait for something to do... int nfds = epoll_wait(epfd, events, MAX_EPOLL_EVENTS_PER_RUN, EPOLL_RUN_TIMEOUT); if (nfds < 0) die("Error in epoll_wait!"); // for each ready socket for(int i = 0; i < nfds; i++) { int fd = events[i].data.fd; handle_io_on_socket(fd); } }
Typical architecture of your application (networking part) is described below. This architecture allow almost unlimited scalability of your application on single and multi-processor systems:
As you can see, epoll() API is very simple but believe me, it is very powerful. Linear scalability allows you to manage huge amounts of parallel connections with small amout of worker processes comparing to classical one-thread per connection.
If you want to read more about epoll or you want to look at some benchmarks, you can visit epoll Scalability Web Page at Sourceforge. Another interesting resources are:
对于select()和poll()函数的讲解暂时到此。 更多细节请参考下面这篇博文:http://www.cppblog.com/just51living/archive/2011/07/28/151995.html
参考代码
本章节是用基本的Linux/Unix基本函数加上poll调用编写一个完整的服务器和客户端例子,可在Linux(ubuntu)和Unix(freebsd)上运行,客户端和服务端的功能如下:
客户端从标准输入读入一行,发送到服务端
服务端从网络读取一行,然后输出到客户端
客户端收到服务端的响应,输出这一行到标准输出
代码如下:
#include <unistd.h> #include <sys/types.h> /* basic system data types */ #include <sys/socket.h> /* basic socket definitions */ #include <netinet/in.h> /* sockaddr_in{} and other Internet defns */ #include <arpa/inet.h> /* inet(3) functions */ #include <stdlib.h> #include <errno.h> #include <stdio.h> #include <string.h> #include <poll.h> /* poll function */ #include <limits.h> #define MAXLINE 10240 #ifndef OPEN_MAX #define OPEN_MAX 40960 #endif void handle(struct pollfd* clients, int maxClient, int readyClient); int main(int argc, char **argv) { int servPort = 6888; int listenq = 1024; int listenfd, connfd; struct pollfd clients[OPEN_MAX]; int maxi; socklen_t socklen = sizeof(struct sockaddr_in); struct sockaddr_in cliaddr, servaddr; char buf[MAXLINE]; int nready; bzero(&servaddr, socklen); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl(INADDR_ANY); servaddr.sin_port = htons(servPort); listenfd = socket(AF_INET, SOCK_STREAM, 0); if (listenfd < 0) { perror("socket error"); } int opt = 1; if (setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)) < 0) { perror("setsockopt error"); } if(bind(listenfd, (struct sockaddr *) &servaddr, socklen) == -1) { perror("bind error"); exit(-1); } if (listen(listenfd, listenq) < 0) { perror("listen error"); } clients[0].fd = listenfd; clients[0].events = POLLIN; int i; for (i = 1; i< OPEN_MAX; i++) clients[i].fd = -1; maxi = listenfd + 1; printf("pollechoserver startup, listen on port:%d\n", servPort); printf("max connection is %d\n", OPEN_MAX); for ( ; ; ) { nready = poll(clients, maxi + 1, -1); //printf("nready is %d\n", nready); if (nready == -1) { perror("poll error"); } if (clients[0].revents & POLLIN) { connfd = accept(listenfd, (struct sockaddr *) &cliaddr, &socklen); sprintf(buf, "accept form %s:%d\n", inet_ntoa(cliaddr.sin_addr), cliaddr.sin_port); printf(buf, ""); for (i = 0; i < OPEN_MAX; i++) { if (clients[i].fd == -1) { clients[i].fd = connfd; clients[i].events = POLLIN; break; } } if (i == OPEN_MAX) { fprintf(stderr, "too many connection, more than %d\n", OPEN_MAX); close(connfd); continue; } if (i > maxi) maxi = i; --nready; } handle(clients, maxi, nready); } } void handle(struct pollfd* clients, int maxClient, int nready) { int connfd; int i, nread; char buf[MAXLINE]; if (nready == 0) return; for (i = 1; i< maxClient; i++) { connfd = clients[i].fd; if (connfd == -1) continue; if (clients[i].revents & (POLLIN | POLLERR)) { nread = read(connfd, buf, MAXLINE);//读取客户端socket流 if (nread < 0) { perror("read error"); close(connfd); clients[i].fd = -1; continue; } if (nread == 0) { printf("client close the connection"); close(connfd); clients[i].fd = -1; continue; } write(connfd, buf, nread);//响应客户端 if (--nready <= 0)//没有连接需要处理,退出循环 break; } } }
编译和启动服务端
gcc pollechoserver.c -o pollechoserver ./pollechoserver
至于客户端可以参考本文的Linux/Unix服务端和客户端Socket编程入门实例的echoclient例子下载编译。
原文:http://www.cnblogs.com/virusolf/p/4951481.html