首页 > 其他 > 详细

OVS DPDK与QEMU之间如何通过vhost user协议通信 vhost user协议的控制和数据通道

时间:2020-11-10 14:03:48      阅读:39      评论:0      收藏:0      [点我收藏+]

A detailed view of the vhost user protocol and its implementation in OVS DPDK, qemu and virtio-net

 

所有的控制信息通过UNIX套接口(控制通道)交互。包括为进行直接内存访问而交换的内存映射信息,以及当数据填入virtio队列后需要出发的kick事件和中断信息。在Neutron中此UNIX套接口命名为vhuxxxxxxxx-xx;

数据通道事实上由内存直接访问实现。客户机中的virtio-net驱动分配一部分内存用于virtio的队列。virtio标准定义了此队列的结构。QEMU通过控制通道将此部分内存的地址共享给OVS DPDK。DPDK自身映射一个相同标准的virtio队列结构到此内存上,藉此来读写客户机巨页内存中的virtio队列。直接内存访问的实现需要在OVS DPDK和QEMU之间使用巨页内存。如果QEMU设置正确,但是没有配置巨页内存,OVS DPDK将不能访问QEMU的内存,二者也就不能交换数据报文。如果用户忘记了请求客户机巨页内存,nova将通过宏数据通知用户。

当OVS DPDK向客户机发送数据包时,这些数据包在OVS DPDK的统计里面显示为接口vhuxxxxxxxx-xx的发送Tx流量。在客户机中,显示为接收Rx流量。

当客户机向OVS DPDK发送数据包时,这些数据包在客户机中显示为发送Tx流量,而在OVS DPDK中显示为接口vhuxxxxxxxx-xx的接收Rx流量。

客户机并没有硬件的统计计数。ethtool工具的-s选项未实现。所有的底层统计计数只能使用OVS的命令显示(ovs-vsctl list get interfave vhuxxxxxxxx-xx statistics),因此显示的数据都是基于OVS DPDK的视角。

虽然数据包可通过共享内存传输,但是还需要一种方法告知对端数据包已经拷贝到virtio队列中。通过vhost user套接口vhuxxxxxxxx-xx实现的控制通道可用来完成通知(kicking)对方的功能。通知必然有代价。首先,需要一个写套接口的系统调用;之后对端需要处理一个中断操作。所以,接收双方都会在控制通道上消耗时间。

为避免控制通道的通知消耗,OpenvSwitch和QEMU都可以设置特殊标志以告知对方其不愿接收中断。尽管如此,只有在采用临时或者固定查询virtio队列方式时才能使用不接收中断的功能。

为客户机的性能考虑其本身可采用DPDK处理数据包。尽管Linux内核采用轮询处理和中断相结合的NAPI机制,但是产生的中断数量仍然很多。OVS DPDK以非常高的速率发送数据包到客户机。同时,QEMU的virtio队列的收发缓存数被限制在了默认的256与最大1024之间。结果,客户机必须以非常快的速度处理数据包。理想的实现就是使用DPDK的PMD驱动不停的轮询客户机端口进行数据包处理。

vhost user协议标准

参见QEMU代码库中文档:https://github.com/qemu/qemu/blob/master/docs/interop/vhost-user.txt

  1.  
    Vhost-user协议
  2.  
    ===================
  3.  
     
  4.  
    Copyright (c) 2014 Virtual Open Systems Sarl.
  5.  
     
  6.  
    This work is licensed under the terms of the GNU GPL, version 2 or later.
  7.  
    See the COPYING file in the top-level directory.
  8.  
    ===================
  9.  
     
  10.  
    此协议旨在补充实现在Linux内核中的vhost的ioctl接口。实现了与同一宿主机中的用户进程交互建立virtqueue队列的控制平面。通过UNIX套接口消息中的附加数据字段来共享文件描述符。
  11.  
     
  12.  
    协议定义了通信的两端:主和从。主时要共享其virtqueues队列的进程,即QEMU。从为virtqueues队列的消费者。
  13.  
     
  14.  
    当前实现中QEMU作为主,从为运行在用户空间的软件交换机,如Snabbswitch。
  15.  
     
  16.  
    主和从在通信时都可以作为客户端(主动连接)或者服务端(监听)。

vhost user协议由两方组成:

  • 主方 - QEMU
  • 从方 - Open vSwitch或者其它软件交换机

vhost user各方都可运行在2中模式下:

  • vhostuser-client - QEMU作为服务端,软件交换机作为客户端
  • vhostuser - 软件交换机作为服务端,QEMU作为客户端。

vhost user实现基于内核的vhost架构,将所有特性实现在用户空间。

当QEMU客户机启动时,它将所有的客户机内存分配为共享的巨页内存。其操作系统的半虚拟化驱动virtio将保留这些巨页内存的一部分用作virtio环形缓存。这样OVS DPDK将可以直接读写客户机的virtio环形缓存。OVS DPDK和QEMU可通过此保留的内存空间交换网络数据包。

用户空间进程接收到客户机预先分配的共享内存文件描述符后,可直接存取与之关联的客户机内存空间中的vrings环结构。 (http://www.virtualopensystems.com/en/solutions/guides/snabbswitch-qemu/).

参见以下的VM虚拟机,模式为vhostuser:

  1.  
    $ /usr/libexec/qemu-kvm -name guest=instance-00000028,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-58-instance-00000028/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client,ss=on,hypervisor=on,tsc_adjust=on,pdpe1gb=on,mpx=off,xsavec=off,xgetbv1=off -m 2048 -realtime mlock=off
  2.  
    -smp 8,sockets=4,cores=1,threads=2
  3.  
    -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/58-instance-00000028,share=yes,size=1073741824,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0
  4.  
    -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/58-instance-00000028,share=yes,size=1073741824,host-nodes=1,policy=bind
  5.  
    -numa node,nodeid=1,cpus=4-7,memdev=ram-node1 -uuid 48888226-7b6b-415c-bcf7-b278ba0bca62 -smbios type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.1.0-3.el7ost,serial=3d5e138a-8193-41e4-ac95-de9bfc1a3ef1,uuid=48888226-7b6b-415c-bcf7-b278ba0bca62,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-58-instance-00000028/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay \
  6.  
    -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/48888226-7b6b-415c-bcf7-b278ba0bca62/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none
  7.  
    -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuc26fd3c6-4b
  8.  
    -netdev vhost-user,chardev=charnet0,queues=8,id=hostnet0 \
  9.  
    -device virtio-net-pci,mq=on,vectors=18,netdev=hostnet0,id=net0,mac=fa:16:3e:52:30:73,bus=pci.0,addr=0x3 -add-fd set=0,fd=33 -chardev file,id=charserial0,path=/dev/fdset/0,append=on \
  10.  
    -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 \
  11.  
    -device isa-serial,chardev=charserial1,id=serial1 \
  12.  
    -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.16.2.10:1 -k en-us \
  13.  
    -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
  14.  
    -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on

指定QEMU从巨页池中分配内存,并设置为共享内存。

  1.  
    -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/58-instance-00000028,share=yes,size=1073741824,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \
  2.  
    -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/58-instance-00000028,share=yes,size=1073741824,host-nodes=1,policy=bind

尽管如此,简单的拷贝数据包到对方的缓存中还不足够。另外,vhost user协议使用一个UNIX套接口(vhu[a-f0-9-])处理vswitch和QEMU之间的通信,包括在初始化过程中,和数据包拷贝到共享内存的virtio环中需要通知对方时。所以两者的交互包括基于控制通道(vhu)的创建操作和通知机制,与拷贝数据包的数据通道(直接内存访问)。

  1.  
    所述virtio机制要能工作,我们需要建立一个接口来初始化共享内存区域和交换event事件描述符。UNIX套接口提供的API接口可实现此要求。此套接口可用于初始化用户空间virtio传输(vhost-user),特别是:
  2.  
     
  3.  
    * 初始化时确定Vrings,并且放入两个进程间的共享内存中;
  4.  
    * 使用eventfd映射到Vring事件。这样就可与QEMU/KVM中的实现相兼容,KVM可以关联客户机系统中virtio_pci驱动所触发事件与宿主机的eventfd(ioventfd和irqfd)文件描述符。
  5.  
     
  6.  
    在两个进程间共享文件描述符与在一个进程和内核直接不相同。前者需要在UNIX套接口的sendmsg系统调用中设置SCM_RIGHTS标志。

(http://www.virtualopensystems.com/en/solutions/guides/snabbswitch-qemu/)

vhostuser模式下,OVS创建vhu套接口,QEMU主动进行连接。vhostuser client模式下,QEMU创建vhu套接口,OVS进行连接。

在上面创建的vhostuser模式客户机实例中,指示QEMU连接一个类型为vhost-user的netdev到套接口/var/run/openvswitch/vhuc26fd3c6-4b:

  1.  
    -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuc26fd3c6-4b \
  2.  
    -netdev vhost-user,chardev=charnet0,queues=8,id=hostnet0 \
  3.  
    -device virtio-net-pci,mq=on,vectors=18, \
  4.  
    netdev=hostnet0,id=net0,mac=fa:16:3e:52:30:73,bus=pci.0,addr=0x3

使用lsof命令显示此套接口为OVS所创建:

[root@overcloud-0 ~]# lsof -nn | grep vhuc26fd3c6-4b | awk ‘{print $1}‘ | uniq

当一方拷贝一个数据报文到共享内存的virtio环中时,另一方有两种选择:

  •     类似(e.g. Linux kernel‘s NAPI)或者 (e.g. DPDK‘s PMD)的轮询队列,不需要通知就可取得新的数据报文;
  •     非队列轮询,必须得到新报文到达的通知。

针对第二种情况,可通过独立的vhu套接口控制通道发送通知到客户机。通过交换eventfd文件描述符数据,控制通道可在用户空间实现中断。套接口的写操作要求系统调用,必将引起PMDs花费时间在内核空间。客户机可通过设置VRING_AVAIL_F_NO_INTERRUPT标志关闭控制通道中断通知。否则,当Open vSwitch网virtio环中填入新数据包时,将发送中断通知到客户机。

详情可参加此博客文章:http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html

  1.  
    用户空间的vhost接口
  2.  
     
  3.  
    vhost架构的一个惊人的特性是其并没有绑定在KVM上。其仅是一个用户空间接口并不依赖于KVM内核模块。这意味着其它的用户空间程序,
  4.  
    比如libpcap,如果要获得高性能I/O接口,理论上也可以使用vhost设备。
  5.  
     
  6.  
    当客户机通知宿主机其已在virtqueue中填入了数据时,需要通知vhost的工作进程有数据要进行处理(对于内核的virtio-net驱动,
  7.  
    vhost工作进程为一个内核线程,名称为vhost-$pid,其中pid为QEMU的进程号)。既然vhost不依赖于KVM内核模块,二者就不能直接通信。
  8.  
    所以vhost实例创建了一个eventfd文件描述符,提供给vhost工作进程去监听。KVM内核模块的ioeventfd特性可将一个eventfd文件
  9.  
    描述符关联到一个特殊的客户机I/O操作上。QEMU用户空间在硬件寄存器VIRTIO_PCI_QUEUE_NOTIFY的I/O访问上注册了virtqueue的通知ioeventfd。
  10.  
    当客户机写VIRTIO_PCI_QUEUE_NOTIFY寄存器时将会发送virtqueue队列通知,vhost工作进程将接收到KVM内核模块通过ioeventfd发来的通知。
  11.  
     
  12.  
    在vhost工作进程需要发送中断到客户机的反向路径上使用相同的方式。vhost通过写一个“call”文件描述符去通知客户机。
  13.  
    KVM内核模块的另一个特性irqfd中断描述符可使eventfd出发客户机中断。QEMU用户空间为virtio的PCI设备中断注册了一个irqfd文件描述符,
  14.  
    并将此irqfd交于vhost实例。vhost工作进程即可通过此“call”文件描述符去中断客户机。
  15.  
     
  16.  
    最终,vhost实例仅了解到客户机的内存映射、kick通知eventfd文件描述符和call中断文件描述符。
  17.  
     
  18.  
    更多细节,参考Linux内核中内核相关代码:
  19.  
     
  20.  
    drivers/vhost/vhost.c - 通用vhost驱动代码
  21.  
    drivers/vhost/net.c - vhost-net网络设备驱动代码
  22.  
    virt/kvm/eventfd.c - ioeventfd事件和irqfd中断文件描述符实现
  23.  
     
  24.  
    QEMU初始化vhost实例的用户空间代码:
  25.  
     
  26.  
    hw/vhost.c - 通用vhost初始化代码
  27.  
    hw/vhost_net.c - vhost-net网络设备初始化代码

数据通道-直接内存访问

virtqueue的内存映射

virtio官方标准定义了virtqueue的结构。

  1.  
    2.4 Virtqueues
  2.  
     
  3.  
    virtio设备的大数据传输机制命名为virtqueue虚拟队列。每个设备可以有多个virtqueues,也可以没有
  4.  
    virtqueue队列。16位的队列大小参数指定了队列内成员的数量,也限定了队列的总大小。
  5.  
     
  6.  
    每个virtqueue队列有三个部分组成:
  7.  
     
  8.  
    Descriptor Table - 描述符表
  9.  
    Available Ring - 可用环
  10.  
    Used Ring - 已用环

http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html

virtio标志精确的定义了描述符表、可用环和已用环的结构。例如,可用环的定义:

  1.  
    2.4.6 virtqueue可用环结构
  2.  
     
  3.  
    struct virtq_avail {
  4.  
    #define VIRTQ_AVAIL_F_NO_INTERRUPT 1
  5.  
    le16 flags;
  6.  
    le16 idx;
  7.  
    le16 ring[ /* Queue Size */ ];
  8.  
    le16 used_event; /* Only if VIRTIO_F_EVENT_IDX */
  9.  
    };
  10.  
     
  11.  
    驱动程序使用可用环提供发送缓存给设备。其中每个环项指向一个描述符链的开头。可用环只能由驱动程序写,由设备读。
  12.  
     
  13.  
    idx成员指示驱动程序将下一个描述符入口项放在了ring成员的哪个位置(不超过队列长度)。其从0开始增加。
  14.  
    传统的标准[Virtio PCI Draft]将此结构定义为vring_avail,将宏定义命名为
  15.  
    VRING_AVAIL_F_NO_INTERRUPT,但是本质结构都还是相同的。

http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html

DPDK的virtio标准实现代码,其也是使用传统virtio标准中的结构定义:
dpdk-18.08/drivers/net/virtio/virtio_ring.h

  1.  
    48 /* The Host uses this in used->flags to advise the Guest: don‘t kick me
  2.  
    49 * when you add a buffer. It‘s unreliable, so it‘s simply an
  3.  
    50 * optimization. Guest will still kick if it‘s out of buffers. */
  4.  
    51 #define VRING_USED_F_NO_NOTIFY 1
  5.  
    52 /* The Guest uses this in avail->flags to advise the Host: don‘t
  6.  
    53 * interrupt me when you consume a buffer. It‘s unreliable, so it‘s
  7.  
    54 * simply an optimization. */
  8.  
    55 #define VRING_AVAIL_F_NO_INTERRUPT 1
  9.  
    56
  10.  
    57 /* VirtIO ring descriptors: 16 bytes.
  11.  
    58 * These can chain together via "next". */
  12.  
    59 struct vring_desc {
  13.  
    60 uint64_t addr; /* Address (guest-physical). */
  14.  
    61 uint32_t len; /* Length. */
  15.  
    62 uint16_t flags; /* The flags as indicated above. */
  16.  
    63 uint16_t next; /* We chain unused descriptors via this. */
  17.  
    64 };
  18.  
    65
  19.  
    66 struct vring_avail {
  20.  
    67 uint16_t flags;
  21.  
    68 uint16_t idx;
  22.  
    69 uint16_t ring[0];
  23.  
    70 };
  24.  
    71
  25.  
    72 /* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
  26.  
    73 struct vring_used_elem {
  27.  
    74 /* Index of start of used descriptor chain. */
  28.  
    75 uint32_t id;
  29.  
    76 /* Total length of the descriptor chain which was written to. */
  30.  
    77 uint32_t len;
  31.  
    78 };
  32.  
    79
  33.  
    80 struct vring_used {
  34.  
    81 uint16_t flags;
  35.  
    82 volatile uint16_t idx;
  36.  
    83 struct vring_used_elem ring[0];
  37.  
    84 };
  38.  
    85
  39.  
    86 struct vring {
  40.  
    87 unsigned int num;
  41.  
    88 struct vring_desc *desc;
  42.  
    89 struct vring_avail *avail;
  43.  
    90 struct vring_used *used;
  44.  
    91 };

 dpdk-18.08/lib/librte_vhost/vhost.h

  1.  
    90 /**
  2.  
    91 * Structure contains variables relevant to RX/TX virtqueues.
  3.  
    92 */
  4.  
    93 struct vhost_virtqueue {
  5.  
    94 union {
  6.  
    95 struct vring_desc *desc;
  7.  
    96 struct vring_packed_desc *desc_packed;
  8.  
    97 };
  9.  
    98 union {
  10.  
    99 struct vring_avail *avail;
  11.  
    100 struct vring_packed_desc_event *driver_event;
  12.  
    101 };
  13.  
    102 union {
  14.  
    103 struct vring_used *used;
  15.  
    104 struct vring_packed_desc_event *device_event;
  16.  
    105 };
  17.  
    106 uint32_t size;
  18.  
    107
  19.  
    108 uint16_t last_avail_idx;
  20.  
    109 uint16_t last_used_idx;
  21.  
    110 /* Last used index we notify to front end. */
  22.  
    111 uint16_t signalled_used;
  23.  
    112 bool signalled_used_valid;
  24.  
    113 #define VIRTIO_INVALID_EVENTFD (-1)
  25.  
    114 #define VIRTIO_UNINITIALIZED_EVENTFD (-2)
  26.  
    115
  27.  
    116 /* Backend value to determine if device should started/stopped */
  28.  
    117 int backend;
  29.  
    118 int enabled;
  30.  
    119 int access_ok;
  31.  
    120 rte_spinlock_t access_lock;
  32.  
    121
  33.  
    122 /* Used to notify the guest (trigger interrupt) */
  34.  
    123 int callfd;
  35.  
    124 /* Currently unused as polling mode is enabled */
  36.  
    125 int kickfd;
  37.  
    126
  38.  
    127 /* Physical address of used ring, for logging */
  39.  
    128 uint64_t log_guest_addr;
  40.  
    129
  41.  
    130 uint16_t nr_zmbuf;
  42.  
    131 uint16_t zmbuf_size;
  43.  
    132 uint16_t last_zmbuf_idx;
  44.  
    133 struct zcopy_mbuf *zmbufs;
  45.  
    134 struct zcopy_mbuf_list zmbuf_list;
  46.  
    135
  47.  
    136 union {
  48.  
    137 struct vring_used_elem *shadow_used_split;
  49.  
    138 struct vring_used_elem_packed *shadow_used_packed;
  50.  
    139 };
  51.  
    140 uint16_t shadow_used_idx;
  52.  
    141 struct vhost_vring_addr ring_addrs;
  53.  
    142
  54.  
    143 struct batch_copy_elem *batch_copy_elems;
  55.  
    144 uint16_t batch_copy_nb_elems;
  56.  
    145 bool used_wrap_counter;
  57.  
    146 bool avail_wrap_counter;
  58.  
    147
  59.  
    148 struct log_cache_entry log_cache[VHOST_LOG_CACHE_NR];
  60.  
    149 uint16_t log_cache_nb_elem;
  61.  
    150
  62.  
    151 rte_rwlock_t iotlb_lock;
  63.  
    152 rte_rwlock_t iotlb_pending_lock;
  64.  
    153 struct rte_mempool *iotlb_pool;
  65.  
    154 TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
  66.  
    155 int iotlb_cache_nr;
  67.  
    156 TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
  68.  
    157 } __rte_cache_aligned;

内存映射完成之后,DPDP就可像客户机的virtio-net驱动一样直接操作其共享内存中的同一结构了。

控制通道-UNIX套接口

QEMU与DPDK通过vhost user套接口交换消息。

DPDK与QEMU的通信遵照标准的vhost-user协议。

消息类型如下:

dpdk-18.08/lib/librte_vhost/vhost_user.h

  1.  
    27 typedef enum VhostUserRequest {
  2.  
    28 VHOST_USER_NONE = 0,
  3.  
    29 VHOST_USER_GET_FEATURES = 1,
  4.  
    30 VHOST_USER_SET_FEATURES = 2,
  5.  
    31 VHOST_USER_SET_OWNER = 3,
  6.  
    32 VHOST_USER_RESET_OWNER = 4,
  7.  
    33 VHOST_USER_SET_MEM_TABLE = 5,
  8.  
    34 VHOST_USER_SET_LOG_BASE = 6,
  9.  
    35 VHOST_USER_SET_LOG_FD = 7,
  10.  
    36 VHOST_USER_SET_VRING_NUM = 8,
  11.  
    37 VHOST_USER_SET_VRING_ADDR = 9,
  12.  
    38 VHOST_USER_SET_VRING_BASE = 10,
  13.  
    39 VHOST_USER_GET_VRING_BASE = 11,
  14.  
    40 VHOST_USER_SET_VRING_KICK = 12,
  15.  
    41 VHOST_USER_SET_VRING_CALL = 13,
  16.  
    42 VHOST_USER_SET_VRING_ERR = 14,
  17.  
    43 VHOST_USER_GET_PROTOCOL_FEATURES = 15,
  18.  
    44 VHOST_USER_SET_PROTOCOL_FEATURES = 16,
  19.  
    45 VHOST_USER_GET_QUEUE_NUM = 17,
  20.  
    46 VHOST_USER_SET_VRING_ENABLE = 18,
  21.  
    47 VHOST_USER_SEND_RARP = 19,
  22.  
    48 VHOST_USER_NET_SET_MTU = 20,
  23.  
    49 VHOST_USER_SET_SLAVE_REQ_FD = 21,
  24.  
    50 VHOST_USER_IOTLB_MSG = 22,
  25.  
    51 VHOST_USER_CRYPTO_CREATE_SESS = 26,
  26.  
    52 VHOST_USER_CRYPTO_CLOSE_SESS = 27,
  27.  
    53 VHOST_USER_MAX = 28
  28.  
    54 } VhostUserRequest;

更详细的有关消息类型的信息参见QEMU源代码中的文件:
https://github.com/qemu/qemu/blob/master/docs/interop/vhost-user.txt

DPDK使用如下函数处理接收到的消息:
dpdk-18.08/lib/librte_vhost/vhost_user.c

  1.  
    1548 int
  2.  
    1549 vhost_user_msg_handler(int vid, int fd)
  3.  
    1550 {

还有dpdk-18.08/lib/librte_vhost/vhost_user.c:

  1.  
    1406 /* return bytes# of read on success or negative val on failure. */
  2.  
    1407 static int
  3.  
    1408 read_vhost_message(int sockfd, struct VhostUserMsg *msg)
  4.  
    1409 {

DPDK向外发送消息使用如下函数
dpdk-18.08/lib/librte_vhost/vhost_user.c

  1.  
    1436 static int
  2.  
    1437 send_vhost_message(int sockfd, struct VhostUserMsg *msg, int *fds, int fd_num)
  3.  
    1438 {
  4.  
    1439 if (!msg)
  5.  
    1440 return 0;
  6.  
    1441
  7.  
    1442 return send_fd_message(sockfd, (char *)msg,
  8.  
    1443 VHOST_USER_HDR_SIZE + msg->size, fds, fd_num);
  9.  
    1444 }

QEMU与之相对应的接收函数为:
qemu-3.0.0/contrib/libvhost-user/libvhost-user.c

  1.  
    1218 static bool
  2.  
    1219 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)

QEMU对应的消息发送函数:
qemu-3.0.0/hw/virtio/vhost-user.c

  1.  
    297 /* most non-init callers ignore the error */
  2.  
    298 static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
  3.  
    299 int *fds, int fd_num)
  4.  
    300 {

DPDK UNIX套接口的注册和消息交互

neutron控制Open vSwitch创建一个名称为vhuxxxxxxxx-xx的接口。在OVS内部,此名称保存在netdev结构体的成员name中(netdev->name)。

当创建vhost user接口时,Open vSwitch控制DPDK注册一个新的vhost-user UNIX套接口。套接口的路径为vhost_sock_dir加netdev->name加设备的dev->vhost_id。

通过设置RTE_VHOST_USER_CLIENT标志,OVS可请求创建vhost user套接口的客户端模式。

OVS函数netdev_dpdk_vhost_construct调用DPDK的rte_vhost_driver_register函数,其又调用vhost_user_create_server或者vhost_user_create_client函数创建套接口。默认使用前者创建服务端模式的套接口,如果设置了RTE_VHOST_USER_CLIENT标志,创建客户端模式套接口。

相关的函数调用关系如下:

  1.  
    OVS
  2.  
    netdev_dpdk_vhost_construct
  3.  
    (struct netdev *netdev)
  4.  
    |
  5.  
    |
  6.  
    DPDK V
  7.  
    rte_vhost_driver_register
  8.  
    (const char *path, uint64_t flags)
  9.  
    |
  10.  
    V
  11.  
    create_unix_socket
  12.  
    (struct vhost_user_socket *vsocket)
  13.  
    |
  14.  
     
  15.  
     
  16.  
    OVS
  17.  
    netdev_dpdk_vhost_construct
  18.  
    (struct netdev *netdev)
  19.  
    |
  20.  
    DPDK V
  21.  
    rte_vhost_driver_start
  22.  
    (const char *path))
  23.  
     
  24.  
     
  25.  
     
  26.  
    -----------------------------------------------
  27.  
    | |
  28.  
    V |
  29.  
    vhost_user_start_server |
  30.  
    (struct vhost_user_socket *vsocket) |
  31.  
    | |
  32.  
    V V
  33.  
    vhost_user_server_new_connection vhost_user_start_client vhost_user_client_reconnect
  34.  
    (int fd, void *dat, int *remove __rte_unused) (struct vhost_user_socket *vsocket) (void *arg __rte_unused)
  35.  
    | | |
  36.  
    V V V
  37.  
    --------------------------------------------------------------------------------------------------
  38.  
    |
  39.  
    V
  40.  
    vhost_user_add_connection
  41.  
    (int fd, struct vhost_user_socket *vsocket)
  42.  
    |
  43.  
    V
  44.  
    vhost_user_read_cb
  45.  
    (int connfd, void *dat, int *remove)
  46.  
    |
  47.  
    V
  48.  
    vhost_user_msg_handler

netdev_dpdk_vhost_construct定义在文件openvswitch-2.9.2/lib/netdev-dpdk.c

  1.  
    1058 static int
  2.  
    1059 netdev_dpdk_vhost_construct(struct netdev *netdev)
  3.  
    1060 {
  4.  
    1061 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
  5.  
    1062 const char *name = netdev->name;
  6.  
    1063 int err;
  7.  
    1064
  8.  
    1065 /* ‘name‘ is appended to ‘vhost_sock_dir‘ and used to create a socket in
  9.  
    1066 * the file system. ‘/‘ or ‘\‘ would traverse directories, so they‘re not
  10.  
    1067 * acceptable in ‘name‘. */
  11.  
    1068 if (strchr(name, ‘/‘) || strchr(name, ‘\\‘)) {
  12.  
    1069 VLOG_ERR("\"%s\" is not a valid name for a vhost-user port. "
  13.  
    1070 "A valid name must not include ‘/‘ or ‘\\‘",
  14.  
    1071 name);
  15.  
    1072 return EINVAL;
  16.  
    1073 }
  17.  
    1074
  18.  
    1075 ovs_mutex_lock(&dpdk_mutex);
  19.  
    1076 /* Take the name of the vhost-user port and append it to the location where
  20.  
    1077 * the socket is to be created, then register the socket.
  21.  
    1078 */
  22.  
    1079 snprintf(dev->vhost_id, sizeof dev->vhost_id, "%s/%s",
  23.  
    1080 dpdk_get_vhost_sock_dir(), name);
  24.  
    1081
  25.  
    1082 dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT;
  26.  
    1083 err = rte_vhost_driver_register(dev->vhost_id, dev->vhost_driver_flags);
  27.  
    1084 if (err) {
  28.  
    1085 VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
  29.  
    1086 dev->vhost_id);
  30.  
    1087 goto out;
  31.  
    1088 } else {
  32.  
    1089 fatal_signal_add_file_to_unlink(dev->vhost_id);
  33.  
    1090 VLOG_INFO("Socket %s created for vhost-user port %s\n",
  34.  
    1091 dev->vhost_id, name);
  35.  
    1092 }
  36.  
    1093
  37.  
    1094 err = rte_vhost_driver_callback_register(dev->vhost_id,
  38.  
    1095 &virtio_net_device_ops);
  39.  
    1096 if (err) {
  40.  
    1097 VLOG_ERR("rte_vhost_driver_callback_register failed for vhost user "
  41.  
    1098 "port: %s\n", name);
  42.  
    1099 goto out;
  43.  
    1100 }
  44.  
    1101
  45.  
    1102 err = rte_vhost_driver_disable_features(dev->vhost_id,
  46.  
    1103 1ULL << VIRTIO_NET_F_HOST_TSO4
  47.  
    1104 | 1ULL << VIRTIO_NET_F_HOST_TSO6
  48.  
    1105 | 1ULL << VIRTIO_NET_F_CSUM);
  49.  
    1106 if (err) {
  50.  
    1107 VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user "
  51.  
    1108 "port: %s\n", name);
  52.  
    1109 goto out;
  53.  
    1110 }
  54.  
    1111
  55.  
    1112 err = rte_vhost_driver_start(dev->vhost_id);
  56.  
    1113 if (err) {
  57.  
    1114 VLOG_ERR("rte_vhost_driver_start failed for vhost user "
  58.  
    1115 "port: %s\n", name);
  59.  
    1116 goto out;
  60.  
    1117 }
  61.  
    1118
  62.  
    1119 err = vhost_common_construct(netdev);
  63.  
    1120 if (err) {
  64.  
    1121 VLOG_ERR("vhost_common_construct failed for vhost user "
  65.  
    1122 "port: %s\n", name);
  66.  
    1123 }
  67.  
    1124
  68.  
    1125 out:
  69.  
    1126 ovs_mutex_unlock(&dpdk_mutex);
  70.  
    1127 VLOG_WARN_ONCE("dpdkvhostuser ports are considered deprecated; "
  71.  
    1128 "please migrate to dpdkvhostuserclient ports.");
  72.  
    1129 return err;
  73.  
    1130 }

netdev_dpdk_vhost_construct函数调用rte_vhost_driver_register。以下代码均定义在dpdk-18.08/lib/librte_vhost/socket.c

  1.  
    798 /*
  2.  
    799 * Register a new vhost-user socket; here we could act as server
  3.  
    800 * (the default case), or client (when RTE_VHOST_USER_CLIENT) flag
  4.  
    801 * is set.
  5.  
    802 */
  6.  
    803 int
  7.  
    804 rte_vhost_driver_register(const char *path, uint64_t flags)
  8.  
    805 {
  9.  
     
  10.  
    867 if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
  11.  
    868 vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
  12.  
    869 if (vsocket->reconnect && reconn_tid == 0) {
  13.  
    870 if (vhost_user_reconnect_init() != 0)
  14.  
    871 goto out_mutex;
  15.  
    872 }
  16.  
    873 } else {
  17.  
    874 vsocket->is_server = true;
  18.  
    875 }
  19.  
    876 ret = create_unix_socket(vsocket);
  20.  
    877 if (ret < 0) {
  21.  
    878 goto out_mutex;
  22.  
    879 }

netdev_dpdk_vhost_construct函数调用rte_vhost_driver_start。定义在dpdk-18.08/lib/librte_vhost/socket.c

  1.  
    1023 int
  2.  
    1024 rte_vhost_driver_start(const char *path)
  3.  
    1025 {
  4.  
     
  5.  
    1059 if (vsocket->is_server)
  6.  
    1060 return vhost_user_start_server(vsocket);
  7.  
    1061 else
  8.  
    1062 return vhost_user_start_client(vsocket);
  9.  
    1063 }

vhost_user_create_server调用vhost_user_server_new_connection:

以下的3个函数调用vhost_user_add_connection:

  1.  
    266 /* call back when there is new vhost-user connection from client */
  2.  
    267 static void
  3.  
    268 vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused)
  4.  
    269 {
  5.  
    (...)
  6.  
    424 static void *
  7.  
    425 vhost_user_client_reconnect(void *arg __rte_unused)
  8.  
    426 {
  9.  
    (...)
  10.  
    494 static int
  11.  
    495 vhost_user_start_client(struct vhost_user_socket *vsocket)
  12.  
    496 {
  13.  
    (...)
  14.  
     
  15.  
    194 static void
  16.  
    195 vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
  17.  
    196 {

vhost_user_add_connection接下来执行vhost_user_read_cb函数,其又调用vhost_user_msg_handler函数处理接收到的消息。

  1.  
    280 static void
  2.  
    281 vhost_user_read_cb(int connfd, void *dat, int *remove)
  3.  
    282 {
  4.  
    283 struct vhost_user_connection *conn = dat;
  5.  
    284 struct vhost_user_socket *vsocket = conn->vsocket;
  6.  
    285 int ret;
  7.  
    286
  8.  
    287 ret = vhost_user_msg_handler(conn->vid, connfd);
  9.  
    288 if (ret < 0) {
  10.  
    289 close(connfd);
  11.  
    290 *remove = 1;
  12.  
    291 vhost_destroy_device(conn->vid);
  13.  
    292
  14.  
    293 if (vsocket->notify_ops->destroy_connection)
  15.  
    294 vsocket->notify_ops->destroy_connection(conn->vid);
  16.  
    295
  17.  
    296 pthread_mutex_lock(&vsocket->conn_mutex);
  18.  
    297 TAILQ_REMOVE(&vsocket->conn_list, conn, next);
  19.  
    298 pthread_mutex_unlock(&vsocket->conn_mutex);
  20.  
    299
  21.  
    300 free(conn);
  22.  
    301
  23.  
    302 if (vsocket->reconnect) {
  24.  
    303 create_unix_socket(vsocket);
  25.  
    304 vhost_user_start_client(vsocket);
  26.  
    305 }
  27.  
    306 }
  28.  
    307 }

dpdk-18.08/lib/librte_vhost/vhost_user.c

  1.  
    1548 int
  2.  
    1549 vhost_user_msg_handler(int vid, int fd)
  3.  
    1550 {
  4.  
    1551 struct virtio_net *dev;
  5.  
    1552 struct VhostUserMsg msg;
  6.  
    1553 struct rte_vdpa_device *vdpa_dev;
  7.  
    1554 int did = -1;
  8.  
    1555 int ret;
  9.  
    1556 int unlock_required = 0;
  10.  
    1557 uint32_t skip_master = 0;
  11.  
    1558
  12.  
    1559 dev = get_device(vid);
  13.  
    1560 if (dev == NULL)
  14.  
    1561 return -1;
  15.  
    1562
  16.  
    1563 if (!dev->notify_ops) {
  17.  
    1564 dev->notify_ops = vhost_driver_callback_get(dev->ifname);
  18.  
    1565 if (!dev->notify_ops) {
  19.  
    1566 RTE_LOG(ERR, VHOST_CONFIG,
  20.  
    1567 "failed to get callback ops for driver %s\n",
  21.  
    1568 dev->ifname);
  22.  
    1569 return -1;
  23.  
    1570 }
  24.  
    1571 }
  25.  
    1572
  26.  
    1573 ret = read_vhost_message(fd, &msg);
  27.  
    1574 if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {
  28.  
    1575 if (ret < 0)
  29.  
    1576 RTE_LOG(ERR, VHOST_CONFIG,
  30.  
    1577 "vhost read message failed\n");
  31.  
    1578 else if (ret == 0)
  32.  
    1579 RTE_LOG(INFO, VHOST_CONFIG,
  33.  
    1580 "vhost peer closed\n");
  34.  
    1581 else
  35.  
    1582 RTE_LOG(ERR, VHOST_CONFIG,
  36.  
    1583 "vhost read incorrect message\n");
  37.  
    1584
  38.  
    1585 return -1;
  39.  
    1586 }
  40.  
    1587
  41.  
    1588 ret = 0;
  42.  
    1589 if (msg.request.master != VHOST_USER_IOTLB_MSG)
  43.  
    1590 RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
  44.  
    1591 vhost_message_str[msg.request.master]);
  45.  
    1592 else
  46.  
    1593 RTE_LOG(DEBUG, VHOST_CONFIG, "read message %s\n",
  47.  
    1594 vhost_message_str[msg.request.master]);
  48.  
    1595
  49.  
    1596 ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
  50.  
    1597 if (ret < 0) {
  51.  
    1598 RTE_LOG(ERR, VHOST_CONFIG,
  52.  
    1599 "failed to alloc queue\n");
  53.  
    1600 return -1;
  54.  
    1601 }
  55.  
    1602
  56.  
    1603 /*
  57.  
    1604 * Note: we don‘t lock all queues on VHOST_USER_GET_VRING_BASE
  58.  
    1605 * and VHOST_USER_RESET_OWNER, since it is sent when virtio stops
  59.  
    1606 * and device is destroyed. destroy_device waits for queues to be
  60.  
    1607 * inactive, so it is safe. Otherwise taking the access_lock
  61.  
    1608 * would cause a dead lock.
  62.  
    1609 */
  63.  
    1610 switch (msg.request.master) {
  64.  
    1611 case VHOST_USER_SET_FEATURES:
  65.  
    1612 case VHOST_USER_SET_PROTOCOL_FEATURES:
  66.  
    1613 case VHOST_USER_SET_OWNER:
  67.  
    1614 case VHOST_USER_SET_MEM_TABLE:
  68.  
    1615 case VHOST_USER_SET_LOG_BASE:
  69.  
    1616 case VHOST_USER_SET_LOG_FD:
  70.  
    1617 case VHOST_USER_SET_VRING_NUM:
  71.  
    1618 case VHOST_USER_SET_VRING_ADDR:
  72.  
    1619 case VHOST_USER_SET_VRING_BASE:
  73.  
    1620 case VHOST_USER_SET_VRING_KICK:
  74.  
    1621 case VHOST_USER_SET_VRING_CALL:
  75.  
    1622 case VHOST_USER_SET_VRING_ERR:
  76.  
    1623 case VHOST_USER_SET_VRING_ENABLE:
  77.  
    1624 case VHOST_USER_SEND_RARP:
  78.  
    1625 case VHOST_USER_NET_SET_MTU:
  79.  
    1626 case VHOST_USER_SET_SLAVE_REQ_FD:
  80.  
    1627 vhost_user_lock_all_queue_pairs(dev);
  81.  
    1628 unlock_required = 1;
  82.  
    1629 break;
  83.  
    1630 default:
  84.  
    1631 break;
  85.  
    1632
  86.  
    1633 }
  87.  
    1634
  88.  
    1635 if (dev->extern_ops.pre_msg_handle) {
  89.  
    1636 uint32_t need_reply;
  90.  
    1637
  91.  
    1638 ret = (*dev->extern_ops.pre_msg_handle)(dev->vid,
  92.  
    1639 (void *)&msg, &need_reply, &skip_master);
  93.  
    1640 if (ret < 0)
  94.  
    1641 goto skip_to_reply;
  95.  
    1642
  96.  
    1643 if (need_reply)
  97.  
    1644 send_vhost_reply(fd, &msg);
  98.  
    1645
  99.  
    1646 if (skip_master)
  100.  
    1647 goto skip_to_post_handle;
  101.  
    1648 }
  102.  
    1649
  103.  
    1650 switch (msg.request.master) {
  104.  
    1651 case VHOST_USER_GET_FEATURES:
  105.  
    1652 msg.payload.u64 = vhost_user_get_features(dev);
  106.  
    1653 msg.size = sizeof(msg.payload.u64);
  107.  
    1654 send_vhost_reply(fd, &msg);
  108.  
    1655 break;
  109.  
    1656 case VHOST_USER_SET_FEATURES:
  110.  
    1657 ret = vhost_user_set_features(dev, msg.payload.u64);
  111.  
    1658 if (ret)
  112.  
    1659 return -1;
  113.  
    1660 break;
  114.  
    1661
  115.  
    1662 case VHOST_USER_GET_PROTOCOL_FEATURES:
  116.  
    1663 vhost_user_get_protocol_features(dev, &msg);
  117.  
    1664 send_vhost_reply(fd, &msg);
  118.  
    1665 break;
  119.  
    1666 case VHOST_USER_SET_PROTOCOL_FEATURES:
  120.  
    1667 vhost_user_set_protocol_features(dev, msg.payload.u64);
  121.  
    1668 break;
  122.  
    1669
  123.  
    1670 case VHOST_USER_SET_OWNER:
  124.  
    1671 vhost_user_set_owner();
  125.  
    1672 break;
  126.  
    1673 case VHOST_USER_RESET_OWNER:
  127.  
    1674 vhost_user_reset_owner(dev);
  128.  
    1675 break;
  129.  
    1676
  130.  
    1677 case VHOST_USER_SET_MEM_TABLE:
  131.  
    1678 ret = vhost_user_set_mem_table(&dev, &msg);
  132.  
    1679 break;
  133.  
    1680
  134.  
    1681 case VHOST_USER_SET_LOG_BASE:
  135.  
    1682 vhost_user_set_log_base(dev, &msg);
  136.  
    1683
  137.  
    1684 /* it needs a reply */
  138.  
    1685 msg.size = sizeof(msg.payload.u64);
  139.  
    1686 send_vhost_reply(fd, &msg);
  140.  
    1687 break;
  141.  
    1688 case VHOST_USER_SET_LOG_FD:
  142.  
    1689 close(msg.fds[0]);
  143.  
    1690 RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
  144.  
    1691 break;
  145.  
    1692
  146.  
    1693 case VHOST_USER_SET_VRING_NUM:
  147.  
    1694 vhost_user_set_vring_num(dev, &msg);
  148.  
    1695 break;
  149.  
    1696 case VHOST_USER_SET_VRING_ADDR:
  150.  
    1697 vhost_user_set_vring_addr(&dev, &msg);
  151.  
    1698 break;
  152.  
    1699 case VHOST_USER_SET_VRING_BASE:
  153.  
    1700 vhost_user_set_vring_base(dev, &msg);
  154.  
    1701 break;
  155.  
    1702
  156.  
    1703 case VHOST_USER_GET_VRING_BASE:
  157.  
    1704 vhost_user_get_vring_base(dev, &msg);
  158.  
    1705 msg.size = sizeof(msg.payload.state);
  159.  
    1706 send_vhost_reply(fd, &msg);
  160.  
    1707 break;
  161.  
    1708
  162.  
    1709 case VHOST_USER_SET_VRING_KICK:
  163.  
    1710 vhost_user_set_vring_kick(&dev, &msg);
  164.  
    1711 break;
  165.  
    1712 case VHOST_USER_SET_VRING_CALL:
  166.  
    1713 vhost_user_set_vring_call(dev, &msg);
  167.  
    1714 break;
  168.  
    1715
  169.  
    1716 case VHOST_USER_SET_VRING_ERR:
  170.  
    1717 if (!(msg.payload.u64 & VHOST_USER_VRING_NOFD_MASK))
  171.  
    1718 close(msg.fds[0]);
  172.  
    1719 RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
  173.  
    1720 break;
  174.  
    1721
  175.  
    1722 case VHOST_USER_GET_QUEUE_NUM:
  176.  
    1723 msg.payload.u64 = (uint64_t)vhost_user_get_queue_num(dev);
  177.  
    1724 msg.size = sizeof(msg.payload.u64);
  178.  
    1725 send_vhost_reply(fd, &msg);
  179.  
    1726 break;
  180.  
    1727
  181.  
    1728 case VHOST_USER_SET_VRING_ENABLE:
  182.  
    1729 vhost_user_set_vring_enable(dev, &msg);
  183.  
    1730 break;
  184.  
    1731 case VHOST_USER_SEND_RARP:
  185.  
    1732 vhost_user_send_rarp(dev, &msg);
  186.  
    1733 break;
  187.  
    1734
  188.  
    1735 case VHOST_USER_NET_SET_MTU:
  189.  
    1736 ret = vhost_user_net_set_mtu(dev, &msg);
  190.  
    1737 break;
  191.  
    1738
  192.  
    1739 case VHOST_USER_SET_SLAVE_REQ_FD:
  193.  
    1740 ret = vhost_user_set_req_fd(dev, &msg);
  194.  
    1741 break;
  195.  
    1742
  196.  
    1743 case VHOST_USER_IOTLB_MSG:
  197.  
    1744 ret = vhost_user_iotlb_msg(&dev, &msg);
  198.  
    1745 break;
  199.  
    1746
  200.  
    1747 default:
  201.  
    1748 ret = -1;
  202.  
    1749 break;
  203.  
    1750 }
  204.  
    1751
  205.  
    1752 skip_to_post_handle:
  206.  
    1753 if (dev->extern_ops.post_msg_handle) {
  207.  
    1754 uint32_t need_reply;
  208.  
    1755
  209.  
    1756 ret = (*dev->extern_ops.post_msg_handle)(
  210.  
    1757 dev->vid, (void *)&msg, &need_reply);
  211.  
    1758 if (ret < 0)
  212.  
    1759 goto skip_to_reply;
  213.  
    1760
  214.  
    1761 if (need_reply)
  215.  
    1762 send_vhost_reply(fd, &msg);
  216.  
    1763 }
  217.  
    1764
  218.  
    1765 skip_to_reply:
  219.  
    1766 if (unlock_required)
  220.  
    1767 vhost_user_unlock_all_queue_pairs(dev);
  221.  
    1768
  222.  
    1769 if (msg.flags & VHOST_USER_NEED_REPLY) {
  223.  
    1770 msg.payload.u64 = !!ret;
  224.  
    1771 msg.size = sizeof(msg.payload.u64);
  225.  
    1772 send_vhost_reply(fd, &msg);
  226.  
    1773 }
  227.  
    1774
  228.  
    1775 if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
  229.  
    1776 dev->flags |= VIRTIO_DEV_READY;
  230.  
    1777
  231.  
    1778 if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
  232.  
    1779 if (dev->dequeue_zero_copy) {
  233.  
    1780 RTE_LOG(INFO, VHOST_CONFIG,
  234.  
    1781 "dequeue zero copy is enabled\n");
  235.  
    1782 }
  236.  
    1783
  237.  
    1784 if (dev->notify_ops->new_device(dev->vid) == 0)
  238.  
    1785 dev->flags |= VIRTIO_DEV_RUNNING;
  239.  
    1786 }
  240.  
    1787 }
  241.  
    1788
  242.  
    1789 did = dev->vdpa_dev_id;
  243.  
    1790 vdpa_dev = rte_vdpa_get_device(did);
  244.  
    1791 if (vdpa_dev && virtio_is_ready(dev) &&
  245.  
    1792 !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
  246.  
    1793 msg.request.master == VHOST_USER_SET_VRING_ENABLE) {
  247.  
    1794 if (vdpa_dev->ops->dev_conf)
  248.  
    1795 vdpa_dev->ops->dev_conf(dev->vid);
  249.  
    1796 dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
  250.  
    1797 if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) {
  251.  
    1798 RTE_LOG(INFO, VHOST_CONFIG,
  252.  
    1799 "(%d) software relay is used for vDPA, performance may be low.\n",
  253.  
    1800 dev->vid);
  254.  
    1801 }
  255.  
    1802 }
  256.  
    1803
  257.  
    1804 return 0;
  258.  
    1805 }

virtio告知DPDK共享内存的virtio queues内存地址

DPDK使用函数vhost_user_set_vring_addr将virtio的描述符、已用环和可用环地址转化为DPDK自身的地址空间。

dpdk-18.08/lib/librte_vhost/vhost_user.c

  1.  
    607 /*
  2.  
    608 * The virtio device sends us the desc, used and avail ring addresses.
  3.  
    609 * This function then converts these to our address space.
  4.  
    610 */
  5.  
    611 static int
  6.  
    612 vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
  7.  
    613 {
  8.  
    614 struct vhost_virtqueue *vq;
  9.  
    615 struct vhost_vring_addr *addr = &msg->payload.addr;
  10.  
    616 struct virtio_net *dev = *pdev;
  11.  
    617
  12.  
    618 if (dev->mem == NULL)
  13.  
    619 return -1;
  14.  
    620
  15.  
    621 /* addr->index refers to the queue index. The txq 1, rxq is 0. */
  16.  
    622 vq = dev->virtqueue[msg->payload.addr.index];
  17.  
    623
  18.  
    624 /*
  19.  
    625 * Rings addresses should not be interpreted as long as the ring is not
  20.  
    626 * started and enabled
  21.  
    627 */
  22.  
    628 memcpy(&vq->ring_addrs, addr, sizeof(*addr));
  23.  
    629
  24.  
    630 vring_invalidate(dev, vq);
  25.  
    631
  26.  
    632 if (vq->enabled && (dev->features &
  27.  
    633 (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
  28.  
    634 dev = translate_ring_addresses(dev, msg->payload.addr.index);
  29.  
    635 if (!dev)
  30.  
    636 return -1;
  31.  
    637
  32.  
    638 *pdev = dev;
  33.  
    639 }
  34.  
    640
  35.  
    641 return 0;
  36.  
    642 }

只有在通过控制通道vhu套接口接收到VHOST_USER_SET_VRING_ADDR类型消息时,设置内存地址。

dpdk-18.08/lib/librte_vhost/vhost_user.c

  1.  
    1548 int
  2.  
    1549 vhost_user_msg_handler(int vid, int fd)
  3.  
    1550 {
  4.  
    (...)
  5.  
    1650 switch (msg.request.master) {
  6.  
    (...)
  7.  
    1696 case VHOST_USER_SET_VRING_ADDR:
  8.  
    1697 vhost_user_set_vring_addr(&dev, &msg);
  9.  
    1698 break;

实际上,QEMU中有一个与DPDK的消息处理函数类型的处理函数。
qemu-3.0.0/contrib/libvhost-user/libvhost-user.c

  1.  
    1218 static bool
  2.  
    1219 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
  3.  
    1220 {
  4.  
    (...)
  5.  
    1244 switch (vmsg->request) {
  6.  
    (...)
  7.  
    1265 case VHOST_USER_SET_VRING_ADDR:
  8.  
    1266 return vu_set_vring_addr_exec(dev, vmsg);
  9.  
    (...)


显然,QEMU中需要有函数通过UNIX套接口发送内存地址信息到DPDK中。
qemu-3.0.0/hw/virtio/vhost-user.c

  1.  
    588 static int vhost_user_set_vring_addr(struct vhost_dev *dev,
  2.  
    589 struct vhost_vring_addr *addr)
  3.  
    590 {
  4.  
    591 VhostUserMsg msg = {
  5.  
    592 .hdr.request = VHOST_USER_SET_VRING_ADDR,
  6.  
    593 .hdr.flags = VHOST_USER_VERSION,
  7.  
    594 .payload.addr = *addr,
  8.  
    595 .hdr.size = sizeof(msg.payload.addr),
  9.  
    596 };
  10.  
    597
  11.  
    598 if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
  12.  
    599 return -1;
  13.  
    600 }
  14.  
    601
  15.  
    602 return 0;
  16.  
    603 }

OVS DPDK发送数据包到客户机与发送丢包

OVS DPDK中向客户机发送数据包的函数为__netdev_dpdk_vhost_send,位于文件openvswitch-2.9.2/lib/netdev-dpdk.c。

OVS发送程序,在空间用完后,仍会尝试发送VHOST_ENQ_RETRY_NUM (默认8)次。如果在第一次尝试发送中,没有任何数据包发送成功(无数据包写入共享内存的环中),或者超过了VHOST_ENQ_RETRY_NUM宏限定的次数,剩余的数据包将被丢弃(批量发送最大可由32个数据包组成)。

  1.  
    2072 do {
  2.  
    2073 int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
  3.  
    2074 unsigned int tx_pkts;
  4.  
    2075
  5.  
    2076 tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt);
  6.  
    2077 if (OVS_LIKELY(tx_pkts)) {
  7.  
    2078 /* Packets have been sent.*/
  8.  
    2079 cnt -= tx_pkts;
  9.  
    2080 /* Prepare for possible retry.*/
  10.  
    2081 cur_pkts = &cur_pkts[tx_pkts];
  11.  
    2082 } else {
  12.  
    2083 /* No packets sent - do not retry.*/
  13.  
    2084 break;
  14.  
    2085 }
  15.  
    2086 } while (cnt && (retries++ <= VHOST_ENQ_RETRY_NUM));
  16.  
    2087
  17.  
    (...)
  18.  
    2094
  19.  
    2095 out:
  20.  
    2096 for (i = 0; i < total_pkts - dropped; i++) {
  21.  
    2097 dp_packet_delete(pkts[i]);
  22.  
    2098 }

客户机接收中断处理

当OVS DPDK将新的数据包填入virtio环中时,有以下两种情形:

  • 客户机没有在轮询其队列,需要告知其新数据包的到达;
  • 客户机正在轮询队列,不需要告知新数据包的到达。

如果客户机使用Linux内核网络协议栈,内核中负责接收报文的NAPI机制混合使用中断和轮询模式。客户机OS开始工作在中断模式,一直到第一个中断进来。此时,CPU快速响应中断,调度内核软中断ksoftirqd线程处理,同时禁止后续中断。

ksoftirqd运行时,尝试处理尽可能多的数据包,但是不能超出netdev_budget限定的数量。如果队列中还有更多的数据包,ksoftirqd线程将重新调度自身,继续处理数据包,直到没有可用的数据包为止。此过程中一直是轮询处理,中断处于关闭状态。处理完数据包之后,ksoftirqd线程停止轮询,重新打开中断,等待下一个数据包到来的中断发生。

当客户机轮询时,CPU的caches高速缓存利用率非常高,避免了额外的延时。宿主机和客户机中合适的进程在运行,进一步降低了延时。另外的,宿主机发送中断IRQ到客户机时,需要对UNIX套接口写操作(系统调用),非常耗时,增加了额外的延时和开销。

作为NFV应用的一部分,客户机中运行DPDK的优势在于其PMD驱动处理流量的方式。PMD驱动工作在轮询模式,关闭了系统中断,OVS DPDK不再需要给客户机发送中断通知。OVS DPDK节省了写UNIX套接口的操作,不在需要执行内核系统调用。OVS DPDK得以一直运行在用户空间,客户机也可以省去处理由控制通道而来的中断,快速运行。

如果没有设置VRING_AVAIL_F_NO_INTERRUPT标志,表明客户机可以接收中断。到客户机的中断通过callfd和操作系统的eventfd组件实现。

客户机的OS可以启用或禁用中断。当客户机禁用virtio接口的中断时,virtio-net驱动通过宏VRING_AVAIL_F_NO_INTERRUPT实现。此宏在DPDK和QEMU中都有定义:

  1.  
    [root@overcloud-0 SOURCES]# grep VRING_AVAIL_F_NO_INTERRUPT -R | grep def
  2.  
    dpdk-18.08/drivers/net/virtio/virtio_ring.h:#define VRING_AVAIL_F_NO_INTERRUPT 1
  3.  
    dpdk-18.08/drivers/crypto/virtio/virtio_ring.h:#define VRING_AVAIL_F_NO_INTERRUPT 1
  1.  
    [root@overcloud-0 qemu]# grep AVAIL_F_NO_INTERRUPT -R -i | grep def
  2.  
    qemu-3.0.0/include/standard-headers/linux/virtio_ring.h:#define VRING_AVAIL_F_NO_INTERRUPT 1
  3.  
    qemu-3.0.0/roms/seabios/src/hw/virtio-ring.h:#define VRING_AVAIL_F_NO_INTERRUPT 1
  4.  
    qemu-3.0.0/roms/ipxe/src/include/ipxe/virtio-ring.h:#define VRING_AVAIL_F_NO_INTERRUPT 1
  5.  
    qemu-3.0.0/roms/seabios-hppa/src/hw/virtio-ring.h:#define VRING_AVAIL_F_NO_INTERRUPT 1
  6.  
    qemu-3.0.0/roms/SLOF/lib/libvirtio/virtio.h:#define VRING_AVAIL_F_NO_INTERRUPT 1

一旦vq->avail->flags中的VRING_AVAIL_F_NO_INTERRUPT标志位设置,指示DPDK不要发送中断到客户机。dpdk-18.08/lib/librte_vhost/vhost.h 

  1.  
    666 static __rte_always_inline void
  2.  
    667 vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
  3.  
    668 {
  4.  
    669 /* Flush used->idx update before we read avail->flags. */
  5.  
    670 rte_smp_mb();
  6.  
    671
  7.  
    672 /* Don‘t kick guest if we don‘t reach index specified by guest. */
  8.  
    673 if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) {
  9.  
    674 uint16_t old = vq->signalled_used;
  10.  
    675 uint16_t new = vq->last_used_idx;
  11.  
    676
  12.  
    677 VHOST_LOG_DEBUG(VHOST_DATA, "%s: used_event_idx=%d, old=%d, new=%d\n",
  13.  
    678 __func__,
  14.  
    679 vhost_used_event(vq),
  15.  
    680 old, new);
  16.  
    681 if (vhost_need_event(vhost_used_event(vq), new, old)
  17.  
    682 && (vq->callfd >= 0)) {
  18.  
    683 vq->signalled_used = vq->last_used_idx;
  19.  
    684 eventfd_write(vq->callfd, (eventfd_t) 1);
  20.  
    685 }
  21.  
    686 } else {
  22.  
    687 /* Kick the guest if necessary. */
  23.  
    688 if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
  24.  
    689 && (vq->callfd >= 0))
  25.  
    690 eventfd_write(vq->callfd, (eventfd_t)1);
  26.  
    691 }
  27.  
    692 }

如前所述,PMD驱动不需要执行写UNIX套接口的系统调用了。

OVS DPDK发送数据包到客户机-代码详情

  1.  
    2044 static void
  2.  
    2045 __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
  3.  
    2046 struct dp_packet **pkts, int cnt)
  4.  
    2047 {
  5.  
    2048 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
  6.  
    2049 struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
  7.  
    2050 unsigned int total_pkts = cnt;
  8.  
    2051 unsigned int dropped = 0;
  9.  
    2052 int i, retries = 0;
  10.  
    2053 int vid = netdev_dpdk_get_vid(dev);
  11.  
    2054
  12.  
    2055 qid = dev->tx_q[qid % netdev->n_txq].map;
  13.  
    2056
  14.  
    2057 if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured || qid < 0
  15.  
    2058 || !(dev->flags & NETDEV_UP))) {
  16.  
    2059 rte_spinlock_lock(&dev->stats_lock);
  17.  
    2060 dev->stats.tx_dropped+= cnt;
  18.  
    2061 rte_spinlock_unlock(&dev->stats_lock);
  19.  
    2062 goto out;
  20.  
    2063 }
  21.  
    2064
  22.  
    2065 rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
  23.  
    2066
  24.  
    2067 cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
  25.  
    2068 /* Check has QoS has been configured for the netdev */
  26.  
    2069 cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
  27.  
    2070 dropped = total_pkts - cnt;
  28.  
    2071
  29.  
    2072 do {
  30.  
    2073 int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
  31.  
    2074 unsigned int tx_pkts;
  32.  
    2075
  33.  
    2076 tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt);
  34.  
    2077 if (OVS_LIKELY(tx_pkts)) {
  35.  
    2078 /* Packets have been sent.*/
  36.  
    2079 cnt -= tx_pkts;
  37.  
    2080 /* Prepare for possible retry.*/
  38.  
    2081 cur_pkts = &cur_pkts[tx_pkts];
  39.  
    2082 } else {
  40.  
    2083 /* No packets sent - do not retry.*/
  41.  
    2084 break;
  42.  
    2085 }
  43.  
    2086 } while (cnt && (retries++ <= VHOST_ENQ_RETRY_NUM));
  44.  
    2087
  45.  
    2088 rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
  46.  
    2089
  47.  
    2090 rte_spinlock_lock(&dev->stats_lock);
  48.  
    2091 netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
  49.  
    2092 cnt + dropped);
  50.  
    2093 rte_spinlock_unlock(&dev->stats_lock);
  51.  
    2094
  52.  
    2095 out:
  53.  
    2096 for (i = 0; i < total_pkts - dropped; i++) {
  54.  
    2097 dp_packet_delete(pkts[i]);
  55.  
    2098 }
  56.  
    2099 }

rte_vhost_enqueue_burst函数来自于DPDK的vhost库。

  1.  
    [root@overcloud-0 src]# grep rte_vhost_enqueue_burst dpdk-16.08/ -R
  2.  
    dpdk-18.08/examples/vhost/main.c: ret = rte_vhost_enqueue_burst(dst_vdev->vid, VIRTIO_RXQ, &m, 1);
  3.  
    dpdk-18.08/examples/vhost/main.c: enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
  4.  
    dpdk-18.08/examples/tep_termination/vxlan_setup.c: ret = rte_vhost_enqueue_burst(vid, VIRTIO_RXQ, pkts_valid, count);
  5.  
    dpdk-18.08/tags:rte_vhost_enqueue_burst lib/librte_vhost/virtio_net.c /^rte_vhost_enqueue_burst(int vid, uint16_t queue_id,$/;" f
  6.  
    dpdk-18.08/lib/librte_vhost/rte_vhost_version.map: rte_vhost_enqueue_burst;
  7.  
    dpdk-18.08/lib/librte_vhost/rte_vhost.h:uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
  8.  
    dpdk-18.08/lib/librte_vhost/virtio_net.c:rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
  9.  
    dpdk-18.08/drivers/net/vhost/rte_eth_vhost.c: nb_pkts = rte_vhost_enqueue_burst(r->vid, r->virtqueue_id,
  10.  
    dpdk-18.08/doc/guides/prog_guide/vhost_lib.rst:* ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)``
  11.  
    dpdk-18.08/doc/guides/rel_notes/release_16_07.rst:* The function ``rte_vhost_enqueue_burst`` no longer supports concurrent enqueuing

dpdk-18.08/lib/librte_vhost/rte_vhost.h

  1.  
    492 /**
  2.  
    493 * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  3.  
    494 * be received from the physical port or from another virtual device. A packet
  4.  
    495 * count is returned to indicate the number of packets that were successfully
  5.  
    496 * added to the RX queue.
  6.  
    497 * @param vid
  7.  
    498 * vhost device ID
  8.  
    499 * @param queue_id
  9.  
    500 * virtio queue index in mq case
  10.  
    501 * @param pkts
  11.  
    502 * array to contain packets to be enqueued
  12.  
    503 * @param count
  13.  
    504 * packets num to be enqueued
  14.  
    505 * @return
  15.  
    506 * num of packets enqueued
  16.  
    507 */
  17.  
    508 uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
  18.  
    509 struct rte_mbuf **pkts, uint16_t count);

dpdk-18.08/lib/librte_vhost/virtio_net.c

  1.  
    932 uint16_t
  2.  
    933 rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
  3.  
    934 struct rte_mbuf **pkts, uint16_t count)
  4.  
    935 {
  5.  
    936 struct virtio_net *dev = get_device(vid);
  6.  
    937
  7.  
    938 if (!dev)
  8.  
    939 return 0;
  9.  
    940
  10.  
    941 if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
  11.  
    942 RTE_LOG(ERR, VHOST_DATA,
  12.  
    943 "(%d) %s: built-in vhost net backend is disabled.\n",
  13.  
    944 dev->vid, __func__);
  14.  
    945 return 0;
  15.  
    946 }
  16.  
    947
  17.  
    948 return virtio_dev_rx(dev, queue_id, pkts, count);
  18.  
    949 }

virtio_dev_rx_packed函数和virtio_dev_rx_split函数都将数据包发送到客户机,并根据设置决定是否发送中断通知(write系统调用)。
dpdk-18.08/lib/librte_vhost/virtio_net.c

  1.  
    886 static __rte_always_inline uint32_t
  2.  
    887 virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
  3.  
    888 struct rte_mbuf **pkts, uint32_t count)
  4.  
    889 {
  5.  
    890 struct vhost_virtqueue *vq;
  6.  
    (...)
  7.  
    917 if (vq_is_packed(dev))
  8.  
    918 count = virtio_dev_rx_packed(dev, vq, pkts, count);
  9.  
    919 else
  10.  
    920 count = virtio_dev_rx_split(dev, vq, pkts, count);
  11.  
    (...)

在virtio_dev_rx函数中:

  1.  
    913 count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
  2.  
    914 if (count == 0)
  3.  
    915 goto out;

发送数据包数量设置为MAX_PKT_BURST宏与空闲项数量(count)两者中的较小值。

最后,根据发送的数据包数量增加已用索引的值。

  1.  
    833 static __rte_always_inline uint32_t
  2.  
    834 virtio_dev_rx_packed(struct virtio_net *dev, struct vhost_virtqueue *vq,
  3.  
    835 struct rte_mbuf **pkts, uint32_t count)
  4.  
    836 {
  5.  
    (...)
  6.  
    841 for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
  7.  
    (...)
  8.  
    862 if (copy_mbuf_to_desc(dev, vq, pkts[pkt_idx],
  9.  
    863 buf_vec, nr_vec,
  10.  
    864 num_buffers) < 0) {
  11.  
    865 vq->shadow_used_idx -= num_buffers;
  12.  
    866 break;
  13.  
    867 }
  14.  
    868
  15.  
    869 vq->last_avail_idx += nr_descs;
  16.  
    870 if (vq->last_avail_idx >= vq->size) {
  17.  
    871 vq->last_avail_idx -= vq->size;
  18.  
    872 vq->avail_wrap_counter ^= 1;
  19.  
    873 }
  20.  
     
  21.  
    781 static __rte_always_inline uint32_t
  22.  
    782 virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
  23.  
    783 struct rte_mbuf **pkts, uint32_t count)
  24.  
    784 {
  25.  
    793 for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
  26.  
    794 uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
  27.  
    795 uint16_t nr_vec = 0;
  28.  
    (...)
  29.  
    813 if (copy_mbuf_to_desc(dev, vq, pkts[pkt_idx],
  30.  
    814 buf_vec, nr_vec,
  31.  
    815 num_buffers) < 0) {
  32.  
    816 vq->shadow_used_idx -= num_buffers;
  33.  
    817 break;
  34.  
    818 }
  35.  
    819
  36.  
    820 vq->last_avail_idx += num_buffers;
  37.  
    821 }

数据包通过函数copy_mbuf_to_desc拷贝到客户机的内存中。最后,根据配置决定是否发送中断通知,参见函数vhost_vring_call_split和vhost_vring_call_packed。

  1.  
    666 static __rte_always_inline void
  2.  
    667 vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
  3.  
    668 {
  4.  
    (...)
  5.  
    672 /* Don‘t kick guest if we don‘t reach index specified by guest. */
  6.  
    673 if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) {
  7.  
    674 uint16_t old = vq->signalled_used;
  8.  
    675 uint16_t new = vq->last_used_idx;
  9.  
    (...)
  10.  
    681 if (vhost_need_event(vhost_used_event(vq), new, old)
  11.  
    682 && (vq->callfd >= 0)) {
  12.  
    683 vq->signalled_used = vq->last_used_idx;
  13.  
    684 eventfd_write(vq->callfd, (eventfd_t) 1);
  14.  
    685 }
  15.  
    686 } else {
  16.  
    687 /* Kick the guest if necessary. */
  17.  
    688 if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
  18.  
    689 && (vq->callfd >= 0))
  19.  
    690 eventfd_write(vq->callfd, (eventfd_t)1);
  20.  
    691 }
  21.  
    692 }
  22.  
     
  23.  
    694 static __rte_always_inline void
  24.  
    695 vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
  25.  
    696 {
  26.  
    (...)
  27.  
    703 if (!(dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))) {
  28.  
    704 if (vq->driver_event->flags !=
  29.  
    705 VRING_EVENT_F_DISABLE)
  30.  
    706 kick = true;
  31.  
    707 goto kick;
  32.  
    708 }
  33.  
    (...)
  34.  
    740 kick:
  35.  
    741 if (kick)
  36.  
    742 eventfd_write(vq->callfd, (eventfd_t)1);
  37.  
    743 }

 

OVS DPDK与QEMU之间如何通过vhost user协议通信 vhost user协议的控制和数据通道

原文:https://www.cnblogs.com/dream397/p/13952664.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!