Slides and recording are available for the “virtio-vsock in QEMU, Firecracker and Linux: Status, Performance and Challenges” talk that Andra Paraschiv and I presented at KVM Forum 2019. This was the 13th edition of the KVM Forum conference. It took place in Lyon, France in October 2019.
We talked about the current status and future works of VSOCK drivers in Linux and how Firecracker and QEMU provides the virtio-vsock device.
Initially, Andra gave an overview of VSOCK, she described the state of the art, and the key features:
it is very simple to configure, the host assigns an unique CID (Context-ID) to each guest, and no configuration is needed inside the guest;
it provides AF_VSOCK address family, allowing user space application in the host and guest to communicate using standard POSIX Socket API (e.g. bind, listen, accept, connect, send, recv, etc.)
Andra also described common use cases for VSOCK, such as guest agents (clipboard sharing, remote console, etc.), network applications using SOCK_STREAM, and services provided by the hypervisor to the guests.
Going into the implementation details, Andra explained how the device in the guest communicates with the vhost backend in the host, exchanging data and events (i.e. ioeventfd, irqfd).
Focusing on Firecracker, Andra gave a brief overview on this new VMM (Virtual Machine Monitor) written in Rust and she explained why, in the v0.18.0 release, they switched from the experimental vhost-vsock implementation to a vhost-less solution:
This change required a device emulation in Firecracker, that implements virtio-vsock device model over MMIO. The device is exposed in the host using UDS (Unix Domain Sockets).
Andra described how Firecracker maps the VSOCK ports on the uds_path
specified in the VM configuration:
Host-Initiated Connections
listen()
on PORTconnect()
to AF_UNIX at uds_path
send()
“CONNECT PORT\n”accept()
the new connectionGuest-Initiated Connections
listen()
on an AF_UNIX socket at uds_path_PORT
connect()
to HOST_CID
and PORT
accept()
the new connectionFinally, she showed the performance of this solution, running iperf-vsock benchmark, varying the size of the buffer used in Firecracker to transfer packets between the virtio-vsock device and the UNIX domain socket. The throughput on the guest to host path reaches 10 Gbps.
In the second part of the talk, I described the QEMU implementation. QEMU provides the virtio-vsock device using the vhost-vsock kernel module.
The vsock device in QEMU handles only:
The vhost-vsock kernel module handles the communication with the guest, providing in-kernel virtio device emulation, to have very high performance and to interface directly to the host socket layer. In this way, also host application can directly use POSIX Socket API to communicate with the guest. So, guest and host applications can be switched between them, changing only the destination CID.
After that, I told the story of VSOCK in the Linux tree, started in 2013 when the first implementation was merged, and the changes in the last year.
These changes mainly regard fixes, but for the virtio/vhost transports we also improved the performance with two simple changes released with Linux v5.4:
With these changes we are able to reach ~40 Gbps in the Guest -> Host path, because the guest can now send up to 64 KB packets directly to the host; for the Host -> Guest path, we reached ~25 Gbps, because the host is still using 4 KB buffer preallocated by the guest.
In the last few years, several applications, tools, and languages started to support VSOCK and I listed them to update the audience:
Tools:
Languages:
Concluding, I went through the next challenges that we are going to face:
multi-transport to use VSOCK in a nested VM environment. because we are limited by the fact that the current implementation can handle only one transport loaded at run time, so, we can’t load virtio_transport and vhost_transport together in the L1 guest. I already sent some patches upstream [RFC, v1], but they are still in progress.
network namespace support to create independent addressing domains with VSOCK socket. This could be useful for partitioning VMs in different domains or, in a nested VM environment, to isolate host applications from guest applications bound to the same port.
virtio-net as a transport for the virtio-vsock to avoid to re-implement features already done in virtio-net, such as mergeable buffers, page allocation, small packet handling.
Other points to be addressed came from the comments we received from the audience:
loopback device could be very useful for developers to test applications that use VSOCK socket. The current implementation support loopback only in the guest, but it would be better to support it also in the host, adding VMADDR_CID_LOCAL
special address.
VM to VM communication was asked by several people. Introducing it in the VSOCK core could complicate the protocol, the addressing and could require some sort of firewall. For now we do not have in mind to do it, but I developed a simple user space application to solve this issue: vsock-bridge. In order to improve the performance of this solution, we will consider the possibility to add sendfile(2)
or MSG_ZEROCOPY support to the AF_VSOCK core.
virtio-vsock windows drivers is not planned to be addressed, but contributions are welcome. Other virtio windows drivers are available in the vm-guest-drivers-windows repository.
KVM Forum 2019: virtio-vsock in QEMU, Firecracker and Linux
原文:https://www.cnblogs.com/dream397/p/13865887.html