[Intel-gfx] [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset

Eric Auger eric.auger at redhat.com
Tue Apr 4 15:59:01 UTC 2023



On 4/4/23 17:29, Liu, Yi L wrote:
>> From: Eric Auger <eric.auger at redhat.com>
>> Sent: Tuesday, April 4, 2023 11:19 PM
>>
>> Hi Yi,
>>
>> On 4/4/23 16:37, Liu, Yi L wrote:
>>> Hi Eric,
>>>
>>>> From: Eric Auger <eric.auger at redhat.com>
>>>> Sent: Tuesday, April 4, 2023 10:00 PM
>>>>
>>>> Hi YI,
>>>>
>>>> On 4/1/23 16:44, Yi Liu wrote:
>>>>> If the affected device is not opened by any user, it's safe to reset it
>>>>> given it's not in use.
>>>>>
>>>>> Reviewed-by: Kevin Tian <kevin.tian at intel.com>
>>>>> Reviewed-by: Jason Gunthorpe <jgg at nvidia.com>
>>>>> Tested-by: Yanting Jiang <yanting.jiang at intel.com>
>>>>> Signed-off-by: Yi Liu <yi.l.liu at intel.com>
>>>>> ---
>>>>>  drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++---
>>>>>  include/uapi/linux/vfio.h        |  8 ++++++++
>>>>>  2 files changed, 19 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>>>>> index 65bbef562268..5d745c9abf05 100644
>>>>> --- a/drivers/vfio/pci/vfio_pci_core.c
>>>>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>>>>> @@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct
>>>> vfio_device_set *dev_set,
>>>>>  	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
>>>>>  		/*
>>>>> -		 * Test whether all the affected devices are contained by the
>>>>> -		 * set of groups provided by the user.
>>>>> +		 * Test whether all the affected devices can be reset by the
>>>>> +		 * user.
>>>>> +		 *
>>>>> +		 * Resetting an unused device (not opened) is safe, because
>>>>> +		 * dev_set->lock is held in hot reset path so this device
>>>>> +		 * cannot race being opened by another user simultaneously.
>>>>> +		 *
>>>>> +		 * Otherwise all opened devices in the dev_set must be
>>>>> +		 * contained by the set of groups provided by the user.
>>>>>  		 */
>>>>> -		if (!vfio_dev_in_groups(cur_vma, groups)) {
>>>>> +		if (cur_vma->vdev.open_count &&
>>>>> +		    !vfio_dev_in_groups(cur_vma, groups)) {
>>>>>  			ret = -EINVAL;
>>>>>  			goto err_undo;
>>>>>  		}
>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>> index 0552e8dcf0cb..f96e5689cffc 100644
>>>>> --- a/include/uapi/linux/vfio.h
>>>>> +++ b/include/uapi/linux/vfio.h
>>>>> @@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info {
>>>>>   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>>>>>   *				    struct vfio_pci_hot_reset)
>>>>>   *
>>>>> + * Userspace requests hot reset for the devices it uses.  Due to the
>>>>> + * underlying topology, multiple devices can be affected in the reset
>>>> by the reset
>>>>> + * while some might be opened by another user.  To avoid interference
>>>> s/interference/hot reset failure?
>>> I don’t think user can really avoid hot reset failure since there may
>>> be new devices plugged into the affected slot. Even user has opened
>> I don't know the legacy wrt that issue but this sounds a serious issue,
>> meaning the reset of an assigned device could impact another device
>> belonging to another group not not owned by the user?
> but the hot reset shall fail as the group is not owned by the user.

sure it shall but I fail to understand if the reset fails or the device
plug is somehow delayed until the reset completes.
>
>>> all the groups/devices reported by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO,
>>> the hot reset can fail if new device is plugged in and has not been
>>> bound to vfio or opened by another user during the window of
>>> _INFO and HOT_RESET.
>> with respect to the latter isn't the dev_set lock held during the hot
>> reset and sufficient to prevent any new opening to occur?
> yes. new open needs to acquire the dev_set lock. So when hot reset
> acquires the dev_set lock, then no new open can occur. 
>
> Regards,
> Yi Liu
>
>>> maybe the whole statement should be as below:
>>>
>>> To avoid interference, the hot reset can only be conducted when all
>>> the affected devices are either opened by the calling user or not
>>> opened yet at the moment of the hot reset attempt.
>> OK
>>
>> Eric
>>>>> + * the calling user must ensure all affected devices, if opened, are
>>>>> + * owned by itself.
>>>>> + *
>>>>> + * The ownership is proved by an array of group fds.
>>>>> + *
>>>>>   * Return: 0 on success, -errno on failure.
>>>>>   */
>>>>>  struct vfio_pci_hot_reset {
>>> Regards,
>>> Yi Liu



More information about the Intel-gfx mailing list