docs: vfio: Add vfio device cdev description
This gives notes for userspace applications on device cdev usage. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-27-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This commit is contained in:
parent
c1cce6d079
commit
094671300f
@ -239,6 +239,137 @@ group and can access them as follows::
|
||||
/* Gratuitous device reset and go... */
|
||||
ioctl(device, VFIO_DEVICE_RESET);
|
||||
|
||||
IOMMUFD and vfio_iommu_type1
|
||||
----------------------------
|
||||
|
||||
IOMMUFD is the new user API to manage I/O page tables from userspace.
|
||||
It intends to be the portal of delivering advanced userspace DMA
|
||||
features (nested translation [5]_, PASID [6]_, etc.) while also providing
|
||||
a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
|
||||
cases. Eventually the vfio_iommu_type1 driver, as well as the legacy
|
||||
vfio container and group model is intended to be deprecated.
|
||||
|
||||
The IOMMUFD backwards compatibility interface can be enabled two ways.
|
||||
In the first method, the kernel can be configured with
|
||||
CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
|
||||
transparently provides the entire infrastructure for the VFIO
|
||||
container and IOMMU backend interfaces. The compatibility mode can
|
||||
also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
|
||||
simply symlink'd to /dev/iommu. Note that at the time of writing, the
|
||||
compatibility mode is not entirely feature complete relative to
|
||||
VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
|
||||
provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore
|
||||
it is not generally advisable at this time to switch from native VFIO
|
||||
implementations to the IOMMUFD compatibility interfaces.
|
||||
|
||||
Long term, VFIO users should migrate to device access through the cdev
|
||||
interface described below, and native access through the IOMMUFD
|
||||
provided interfaces.
|
||||
|
||||
VFIO Device cdev
|
||||
----------------
|
||||
|
||||
Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
|
||||
in a VFIO group.
|
||||
|
||||
With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
|
||||
by directly opening a character device /dev/vfio/devices/vfioX where
|
||||
"X" is the number allocated uniquely by VFIO for registered devices.
|
||||
cdev interface does not support noiommu devices, so user should use
|
||||
the legacy group interface if noiommu is wanted.
|
||||
|
||||
The cdev only works with IOMMUFD. Both VFIO drivers and applications
|
||||
must adapt to the new cdev security model which requires using
|
||||
VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
|
||||
actually use the device. Once BIND succeeds then a VFIO device can
|
||||
be fully accessed by the user.
|
||||
|
||||
VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
|
||||
Hence those modules can be fully compiled out in an environment
|
||||
where no legacy VFIO application exists.
|
||||
|
||||
So far SPAPR does not support IOMMUFD yet. So it cannot support device
|
||||
cdev either.
|
||||
|
||||
vfio device cdev access is still bound by IOMMU group semantics, ie. there
|
||||
can be only one DMA owner for the group. Devices belonging to the same
|
||||
group can not be bound to multiple iommufd_ctx or shared between native
|
||||
kernel and vfio bus driver or other driver supporting the driver_managed_dma
|
||||
flag. A violation of this ownership requirement will fail at the
|
||||
VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access.
|
||||
|
||||
Device cdev Example
|
||||
-------------------
|
||||
|
||||
Assume user wants to access PCI device 0000:6a:01.0::
|
||||
|
||||
$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
|
||||
vfio0
|
||||
|
||||
This device is therefore represented as vfio0. The user can verify
|
||||
its existence::
|
||||
|
||||
$ ls -l /dev/vfio/devices/vfio0
|
||||
crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
|
||||
$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
|
||||
511:0
|
||||
$ ls -l /dev/char/511\:0
|
||||
lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
|
||||
|
||||
Then provide the user with access to the device if unprivileged
|
||||
operation is desired::
|
||||
|
||||
$ chown user:user /dev/vfio/devices/vfio0
|
||||
|
||||
Finally the user could get cdev fd by::
|
||||
|
||||
cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
|
||||
|
||||
An opened cdev_fd doesn't give the user any permission of accessing
|
||||
the device except binding the cdev_fd to an iommufd. After that point
|
||||
then the device is fully accessible including attaching it to an
|
||||
IOMMUFD IOAS/HWPT to enable userspace DMA::
|
||||
|
||||
struct vfio_device_bind_iommufd bind = {
|
||||
.argsz = sizeof(bind),
|
||||
.flags = 0,
|
||||
};
|
||||
struct iommu_ioas_alloc alloc_data = {
|
||||
.size = sizeof(alloc_data),
|
||||
.flags = 0,
|
||||
};
|
||||
struct vfio_device_attach_iommufd_pt attach_data = {
|
||||
.argsz = sizeof(attach_data),
|
||||
.flags = 0,
|
||||
};
|
||||
struct iommu_ioas_map map = {
|
||||
.size = sizeof(map),
|
||||
.flags = IOMMU_IOAS_MAP_READABLE |
|
||||
IOMMU_IOAS_MAP_WRITEABLE |
|
||||
IOMMU_IOAS_MAP_FIXED_IOVA,
|
||||
.__reserved = 0,
|
||||
};
|
||||
|
||||
iommufd = open("/dev/iommu", O_RDWR);
|
||||
|
||||
bind.iommufd = iommufd;
|
||||
ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
|
||||
|
||||
ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
|
||||
attach_data.pt_id = alloc_data.out_ioas_id;
|
||||
ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
|
||||
|
||||
/* Allocate some space and setup a DMA mapping */
|
||||
map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
|
||||
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
|
||||
map.iova = 0; /* 1MB starting at 0x0 from device view */
|
||||
map.length = 1024 * 1024;
|
||||
map.ioas_id = alloc_data.out_ioas_id;;
|
||||
|
||||
ioctl(iommufd, IOMMU_IOAS_MAP, &map);
|
||||
|
||||
/* Other device operations as stated in "VFIO Usage Example" */
|
||||
|
||||
VFIO User API
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
@ -566,3 +697,11 @@ This implementation has some specifics:
|
||||
\-0d.1
|
||||
|
||||
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
|
||||
|
||||
.. [5] Nested translation is an IOMMU feature which supports two stage
|
||||
address translations. This improves the address translation efficiency
|
||||
in IOMMU virtualization.
|
||||
|
||||
.. [6] PASID stands for Process Address Space ID, introduced by PCI
|
||||
Express. It is a prerequisite for Shared Virtual Addressing (SVA)
|
||||
and Scalable I/O Virtualization (Scalable IOV).
|
||||
|
Loading…
Reference in New Issue
Block a user