[PATCH 03/14] drm/amdgpu: Add helper to turn an existing VM into a compute VM
Alex Deucher
alexdeucher at gmail.com
Mon Mar 5 19:32:36 UTC 2018
On Mon, Mar 5, 2018 at 12:14 PM, Christian König
<christian.koenig at amd.com> wrote:
> Am 05.03.2018 um 17:58 schrieb Felix Kuehling:
>>
>> On 2018-03-05 07:17 AM, Christian König wrote:
>>>
>>> Am 04.03.2018 um 03:34 schrieb Felix Kuehling:
>>>>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 82
>>>> ++++++++++++++++++++++++++++++++++
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 1 +
>>>> 2 files changed, 83 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 5afbc5e..58153ef 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -2350,6 +2350,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev,
>>>> struct amdgpu_vm *vm,
>>>> INIT_KFIFO(vm->faults);
>>>> vm->fault_credit = 16;
>>>> + vm->vm_context = vm_context;
>>>
>>> I think we should drop the vm_context parameter and all the related
>>> code in amdgpu_vm_init(). But that can be part of a later cleanup
>>> patch as well.
>>
>> Yep. It will be a prerequisite for sharing VMs between graphics and
>> compute. But then CPU vs SDMA page table update would need to be handled
>> differently, on a per-call basis and probably some synchronization
>> between them.
>>
>> Also, we'd need to fix shadow page table handling with CPU updates, or
>> get rid of shadow page tables. I'm not sure why they're needed, TBH.
>
>
> The idea behind shadow page tables is that you have the current state of the
> pipeline if the GPU crashes. Since with CPU updates there is no pipeline
> they doesn't make sense at all with them.
They are also to protect against vram loss in the case of a GPU reset.
Alex
>
>
>>
>>>> +
>>>> return 0;
>>>> error_free_root:
>>>> @@ -2364,6 +2366,86 @@ int amdgpu_vm_init(struct amdgpu_device *adev,
>>>> struct amdgpu_vm *vm,
>>>> }
>>>> /**
>>>> + * amdgpu_vm_make_compute - Turn a GFX VM into a compute VM
>>>> + *
>>>> + * This only works on GFX VMs that don't have any BOs added and no
>>>> + * page tables allocated yet.
>>>> + *
>>>> + * Changes the following VM parameters:
>>>> + * - vm_context
>>>> + * - use_cpu_for_update
>>>> + * - pte_supports_ats
>>>> + * - pasid (old PASID is released, because compute manages its own
>>>> PASIDs)
>>>> + *
>>>> + * Reinitializes the page directory to reflect the changed ATS
>>>> + * setting. May leave behind an unused shadow BO for the page
>>>> + * directory when switching from SDMA updates to CPU updates.
>>>> + *
>>>> + * Returns 0 for success, -errno for errors.
>>>> + */
>>>> +int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct
>>>> amdgpu_vm *vm)
>>>> +{
>>>> + bool pte_support_ats = (adev->asic_type == CHIP_RAVEN);
>>>> + int r;
>>>> +
>>>> + r = amdgpu_bo_reserve(vm->root.base.bo, true);
>>>> + if (r)
>>>> + return r;
>>>> +
>>>> + /* Sanity checks */
>>>> + if (vm->vm_context == AMDGPU_VM_CONTEXT_COMPUTE) {
>>>> + /* Can happen if ioctl is interrupted by a signal after
>>>> + * this function already completed. Just return success.
>>>> + */
>>>> + r = 0;
>>>> + goto error;
>>>> + }
>>>
>>> Ok, that is actually a show stopper. An interrupted IOCTL should never
>>> have a visible effect.
>>>
>>> Is that just a theoretical issue or did you run into this in practice?
>>
>> It's theoretical. init_kfd_vm in patch 4 reserves the page directory
>> again, validates it and waits for the validation to complete. Both the
>> reservation and the waiting can be interrupted by signals.
>
>
> No, both have parameters to control if they are interruptible or not for
> exactly this reason.
>
>> This happens
>> after amdgpu_vm_make_compute has completed.
>>
>> If we wanted to achieve "no visible effect", we'd need to roll back the
>> conversion to a compute VM, which may involve more waiting.
>>
>> We have a similar issue in map_memory_to_gpu. We map on all GPUs and
>> wait for a sync object in the end. If SDMA is used, this allows the SDMA
>> on all GPUs to work concurrently. The wait in the end can be
>> interrupted. Instead of rolling back the mapping on all GPUs (which
>> would require more waiting), we just let the system call return
>> -ERESTARTSYS and make sure the restart doesn't cause any trouble.
>
>
> If waits are interruptible or not is controllable by parameters. At least
> for importing the VM it is probably ok to disable this.
>
>> I've been looking for documentation about the expected behaviour for
>> system calls interrupted by signals. All I can find is this statement in
>> Documentation/kernel-hacking/hacking.rst:
>>>
>>> After you slept you should check if a signal occurred: the Unix/Linux
>>> way of handling signals is to temporarily exit the system call with the
>>> ``-ERESTARTSYS`` error. The system call entry code will switch back to
>>> user context, process the signal handler and then your system call will
>>> be restarted (unless the user disabled that). So you should be prepared
>>> to process the restart, e.g. if you're in the middle of manipulating
>>> some data structure.
>>
>> The same paragraph is also copied in
>> https://www.kernel.org/doc/htmldocs/kernel-hacking/ioctls.html.
>>
>> The key sentence is "should be prepared to process the restart, e.g. if
>> you're in the middle of manipulating some data structure". Rolling back
>> everything would achieve that, but it's not the only way and not the
>> most efficient way.
>>
>> In this case, I'm handling the restart by checking whether the VM is
>> already a compute VM. So on the restart the conversion to a compute VM
>> becomes a no-op.
>>
>> Similarly, in the map_memory_to_gpu ioctl, mapping on a GPU where the
>> memory is already mapped becomes a no-op, which handles the restart
>> case. In this case we even added a specific test for this condition in
>> kfdtest, where we deliberately interrupt a big mapping operation with a
>> signal.
>
>
> This approach is a clear no-go.
>
> Changes done before interrupting a system call doesn't necessary need to be
> reverted, but only if they don't have a user visible effect.
>
> For example: When we map something we need to allocate multiple page
> directories. It is not necessary to release all previous allocated ones if
> an allocation suddenly says that it was interrupted, but only because those
> page directories are not user accessible.
>
> Similar exceptions can be done by avoiding things when the IOCTL was
> interrupted. E.g. in the VA IOCTL we usually directly submit updating the
> page tables, but if that is interrupted we just return success instead of
> restarting the IOCTL and wait with the update until the first command
> submission.
>
> The easiest way to work around this to to add a separate IOCTL which waits
> for VM updates to complete. Should be trivial to use vm->last_update for
> this.
>
> Regards,
> Christian.
>
>
>>
>> Regards,
>> Felix
>>
>>> Regards,
>>> Christian.
>>>
>>>> + if (!RB_EMPTY_ROOT(&vm->va.rb_root) || vm->root.entries) {
>>>> + r = -EINVAL;
>>>> + goto error;
>>>> + }
>>>> +
>>>> + /* Check if PD needs to be reinitialized and do it before
>>>> + * changing any other state, in case it fails.
>>>> + */
>>>> + if (pte_support_ats != vm->pte_support_ats) {
>>>> + /* TODO: This depends on a patch series by Christian.
>>>> + * It's only needed for GFX9 GPUs, which aren't
>>>> + * supported by upstream KFD yet.
>>>> + r = amdgpu_vm_clear_bo(adev, vm, vm->root.base.bo,
>>>> + adev->vm_manager.root_level,
>>>> + pte_support_ats);
>>>> + if (r)
>>>> + goto error;
>>>> + */
>>>> + }
>>>> +
>>>> + /* Update VM state */
>>>> + vm->vm_context = AMDGPU_VM_CONTEXT_COMPUTE;
>>>> + vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
>>>> + AMDGPU_VM_USE_CPU_FOR_COMPUTE);
>>>> + vm->pte_support_ats = pte_support_ats;
>>>> + DRM_DEBUG_DRIVER("VM update mode is %s\n",
>>>> + vm->use_cpu_for_update ? "CPU" : "SDMA");
>>>> + WARN_ONCE((vm->use_cpu_for_update & !amdgpu_vm_is_large_bar(adev)),
>>>> + "CPU update of VM recommended only for large BAR system\n");
>>>> +
>>>> + if (vm->pasid) {
>>>> + unsigned long flags;
>>>> +
>>>> + spin_lock_irqsave(&adev->vm_manager.pasid_lock, flags);
>>>> + idr_remove(&adev->vm_manager.pasid_idr, vm->pasid);
>>>> + spin_unlock_irqrestore(&adev->vm_manager.pasid_lock, flags);
>>>> +
>>>> + vm->pasid = 0;
>>>> + }
>>>> +
>>>> +error:
>>>> + amdgpu_bo_unreserve(vm->root.base.bo);
>>>> + return r;
>>>> +}
>>>> +
>>>> +/**
>>>> * amdgpu_vm_free_levels - free PD/PT levels
>>>> *
>>>> * @adev: amdgpu device structure
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> index 40b4e09..7f50a38 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> @@ -263,6 +263,7 @@ void amdgpu_vm_manager_init(struct amdgpu_device
>>>> *adev);
>>>> void amdgpu_vm_manager_fini(struct amdgpu_device *adev);
>>>> int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>> int vm_context, unsigned int pasid);
>>>> +int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct
>>>> amdgpu_vm *vm);
>>>> void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm
>>>> *vm);
>>>> bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
>>>> unsigned int pasid);
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list