Virtio on Linux¶
Introduction¶
Virtio is an open standard that defines a protocol for communication between drivers and devices of different types, see Chapter 5 (“Device Types”) of the virtio spec ([1]). Originally developed as a standard for paravirtualized devices implemented by a hypervisor, it can be used to interface any compliant device (real or emulated) with a driver.
For illustrative purposes, this document will focus on the common case of a Linux kernel running in a virtual machine and using paravirtualized devices provided by the hypervisor, which exposes them as virtio devices via standard mechanisms such as PCI.
Device - Driver communication: virtqueues¶
Although the virtio devices are really an abstraction layer in the hypervisor, they’re exposed to the guest as if they are physical devices using a specific transport method -- PCI, MMIO or CCW -- that is orthogonal to the device itself. The virtio spec defines these transport methods in detail, including device discovery, capabilities and interrupt handling.
The communication between the driver in the guest OS and the device in the hypervisor is done through shared memory (that’s what makes virtio devices so efficient) using specialized data structures called virtqueues, which are actually ring buffers [1] of buffer descriptors similar to the ones used in a network device:
- 
struct vring_desc¶
- Virtio ring descriptors, 16 bytes long. These can chain together via next. 
Definition:
struct vring_desc {
    __virtio64 addr;
    __virtio32 len;
    __virtio16 flags;
    __virtio16 next;
};
Members
- addr
- buffer address (guest-physical) 
- len
- buffer length 
- flags
- descriptor flags 
- next
- index of the next descriptor in the chain, if the VRING_DESC_F_NEXT flag is set. We chain unused descriptors via this, too. 
All the buffers the descriptors point to are allocated by the guest and used by the host either for reading or for writing but not for both.
Refer to Chapter 2.5 (“Virtqueues”) of the virtio spec ([1]) for the reference definitions of virtqueues and “Virtqueues and virtio ring: How the data travels” blog post ([2]) for an illustrated overview of how the host device and the guest driver communicate.
The vring_virtqueue struct models a virtqueue, including the
ring buffers and management data. Embedded in this struct is the
virtqueue struct, which is the data structure that’s
ultimately used by virtio drivers:
- 
struct virtqueue¶
- a queue to register buffers for sending or receiving. 
Definition:
struct virtqueue {
    struct list_head list;
    void (*callback)(struct virtqueue *vq);
    const char *name;
    struct virtio_device *vdev;
    unsigned int index;
    unsigned int num_free;
    unsigned int num_max;
    bool reset;
    void *priv;
};
Members
- list
- the chain of virtqueues for this device 
- callback
- the function to call when buffers are consumed (can be NULL). 
- name
- the name of this virtqueue (mainly for debugging) 
- vdev
- the virtio device this queue was created for. 
- index
- the zero-based ordinal number for this queue. 
- num_free
- number of elements we expect to be able to fit. 
- num_max
- the maximum number of elements supported by the device. 
- reset
- vq is in reset state or not. 
- priv
- a pointer for the virtqueue implementation to use. 
Description
A note on num_free: with indirect buffers, each buffer needs one element in the queue, otherwise a buffer will need one element per sg element.
The callback function pointed by this struct is triggered when the
device has consumed the buffers provided by the driver. More
specifically, the trigger will be an interrupt issued by the hypervisor
(see vring_interrupt()). Interrupt request handlers are registered for
a virtqueue during the virtqueue setup process (transport-specific).
- 
irqreturn_t vring_interrupt(int irq, void *_vq)¶
- notify a virtqueue on an interrupt 
Parameters
- int irq
- the IRQ number (ignored) 
- void *_vq
- the - struct virtqueueto notify
Description
Calls the callback function of _vq to process the virtqueue notification.
Device discovery and probing¶
In the kernel, the virtio core contains the virtio bus driver and transport-specific drivers like virtio-pci and virtio-mmio. Then there are individual virtio drivers for specific device types that are registered to the virtio bus driver.
How a virtio device is found and configured by the kernel depends on how the hypervisor defines it. Taking the QEMU virtio-console device as an example. When using PCI as a transport method, the device will present itself on the PCI bus with vendor 0x1af4 (Red Hat, Inc.) and device id 0x1003 (virtio console), as defined in the spec, so the kernel will detect it as it would do with any other PCI device.
During the PCI enumeration process, if a device is found to match the virtio-pci driver (according to the virtio-pci device table, any PCI device with vendor id = 0x1af4):
/* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
static const struct pci_device_id virtio_pci_id_table[] = {
        { PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
        { 0 }
};
then the virtio-pci driver is probed and, if the probing goes well, the device is registered to the virtio bus:
static int virtio_pci_probe(struct pci_dev *pci_dev,
                            const struct pci_device_id *id)
{
        ...
        if (force_legacy) {
                rc = virtio_pci_legacy_probe(vp_dev);
                /* Also try modern mode if we can't map BAR0 (no IO space). */
                if (rc == -ENODEV || rc == -ENOMEM)
                        rc = virtio_pci_modern_probe(vp_dev);
                if (rc)
                        goto err_probe;
        } else {
                rc = virtio_pci_modern_probe(vp_dev);
                if (rc == -ENODEV)
                        rc = virtio_pci_legacy_probe(vp_dev);
                if (rc)
                        goto err_probe;
        }
        ...
        rc = register_virtio_device(&vp_dev->vdev);
When the device is registered to the virtio bus the kernel will look
for a driver in the bus that can handle the device and call that
driver’s probe method.
At this point, the virtqueues will be allocated and configured by
calling the appropriate virtio_find helper function, such as
virtio_find_single_vq() or virtio_find_vqs(), which will end up calling
a transport-specific find_vqs method.
References¶
[1] Virtio Spec v1.2: https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html
[2] Virtqueues and virtio ring: How the data travels https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels
Footnotes