[Intel-gfx] [PATCH v4] vfio/pci: Add support for opregion v2.1+

Alex Williamson alex.williamson at redhat.com
Fri Mar 19 19:26:34 UTC 2021


On Tue,  2 Mar 2021 21:02:20 +0800
Fred Gao <fred.gao at intel.com> wrote:

> Before opregion version 2.0 VBT data is stored in opregion mailbox #4,
> However, When VBT data exceeds 6KB size and cannot be within mailbox #4
> starting from opregion v2.0+, Extended VBT region, next to opregion, is
> used to hold the VBT data, so the total size will be opregion size plus
> extended VBT region size.
> 
> since opregion v2.0 with physical host VBT address should not be
> practically available for end user, it is not supported.
> 
> Cc: Zhenyu Wang <zhenyuw at linux.intel.com>
> Signed-off-by: Swee Yee Fonn <swee.yee.fonn at intel.com>
> Signed-off-by: Fred Gao <fred.gao at intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_igd.c | 49 +++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
> index 53d97f459252..4edb8afcdbfc 100644
> --- a/drivers/vfio/pci/vfio_pci_igd.c
> +++ b/drivers/vfio/pci/vfio_pci_igd.c
> @@ -21,6 +21,10 @@
>  #define OPREGION_SIZE		(8 * 1024)
>  #define OPREGION_PCI_ADDR	0xfc
>  
> +#define OPREGION_RVDA		0x3ba
> +#define OPREGION_RVDS		0x3c2
> +#define OPREGION_VERSION	0x16
> +
>  static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
>  			      size_t count, loff_t *ppos, bool iswrite)
>  {
> @@ -58,6 +62,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
>  	u32 addr, size;
>  	void *base;
>  	int ret;
> +	u16 version;
>  
>  	ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr);
>  	if (ret)
> @@ -83,6 +88,50 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
>  
>  	size *= 1024; /* In KB */
>  
> +	/*
> +	 * Support opregion v2.1+
> +	 * When VBT data exceeds 6KB size and cannot be within mailbox #4

s/#4/#4, then the/

> +	 * Extended VBT region, next to opregion, is used to hold the VBT data.
> +	 * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS
> +	 * (VBT Data Size) from opregion structure member are used to hold the
> +	 * address from region base and size of VBT data while RVDA/RVDS
> +	 * are not defined before opregion 2.0.
> +	 *
> +	 * opregion 2.0: rvda is the physical VBT address.

Let's expand the comment to include why this is a problem to support
(virtualization of this register would be required in userspace) and why
we're choosing not to manipulate this into a 2.1+ table, which I think
is both the practical lack of v2.0 tables in use and any implicit
dependencies software may have on the OpRegion version.

> +	 *
> +	 * opregion 2.1+: rvda is unsigned, relative offset from
> +	 * opregion base, and should never point within opregion.

And for our purposes must exactly follow the base opregion to avoid
exposing unknown host memory to userspace, ie. provide a more
descriptive justification for the 2nd error condition below.

> +	 */
> +	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
> +	if (version >= 0x0200) {
> +		u64 rvda;
> +		u32 rvds;
> +
> +		rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
> +		rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
> +		if (rvda && rvds) {
> +			/* no support for opregion v2.0 with physical VBT address */
> +			if (version == 0x0200) {
> +				memunmap(base);
> +				pci_err(vdev->pdev,
> +					"IGD passthrough does not support opregion\n"
> +					"version 0x%x with physical rvda 0x%llx\n", version, rvda);


Why do we need a new line midway through this log message?

s/passthrough/assignment/

In testing the version you include the leading zero, do you also want
that leading zero in the printed version, ie. %04x?

If we get to this code, we already know that both rvda and rvds are
non-zero, why is it useful to print the rvda value in this error
message?  For example, we could print:

 "IGD assignment does not support opregion version 0x%04x with an extended VBT region"

> +				return -EINVAL;
> +			}
> +
> +			if ((u32)rvda != size) {

What allows us to assume rvda is a 32bit value given that it's a 64bit
register?  It seems safer not to include this cast.

> +				memunmap(base);
> +				pci_err(vdev->pdev,
> +					"Extended VBT does not follow opregion !\n"
> +					"opregion version 0x%x:rvda 0x%llx\n", version, rvda);

Again I'm not sure about the usefulness of printing the rvda value on
its own.  Without knowing the size value it seems meaningless.  Like
above, get rid of the mid-error new line and random space if you keep
the exclamation point.

> +				return -EINVAL;
> +			}
> +
> +			/* region size for opregion v2.0+: opregion and VBT size */
> +			size += rvds;

RVDS is defined as size in bytes, not in kilobytes like the base
opregion size, right?  Let's include that clarification in the comment
since the spec is private.  Thanks,

Alex


> +		}
> +	}
> +
>  	if (size != OPREGION_SIZE) {
>  		memunmap(base);
>  		base = memremap(addr, size, MEMREMAP_WB);



More information about the Intel-gfx mailing list