[PATCH 03/14] drm/amdgpu: Add helper to turn an existing VM into a compute VM

Mon Mar 5 19:32:36 UTC 2018

On Mon, Mar 5, 2018 at 12:14 PM, Christian König
<christian.koenig at amd.com> wrote:
> Am 05.03.2018 um 17:58 schrieb Felix Kuehling:
>>
>> On 2018-03-05 07:17 AM, Christian König wrote:
>>>
>>> Am 04.03.2018 um 03:34 schrieb Felix Kuehling:
>>>>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 82
>>>> ++++++++++++++++++++++++++++++++++
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  1 +
>>>>    2 files changed, 83 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 5afbc5e..58153ef 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -2350,6 +2350,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev,
>>>> struct amdgpu_vm *vm,
>>>>        INIT_KFIFO(vm->faults);
>>>>        vm->fault_credit = 16;
>>>>    +    vm->vm_context = vm_context;
>>>
>>> I think we should drop the vm_context parameter and all the related
>>> code in amdgpu_vm_init(). But that can be part of a later cleanup
>>> patch as well.
>>
>> Yep. It will be a prerequisite for sharing VMs between graphics and
>> compute. But then CPU vs SDMA page table update would need to be handled
>> differently, on a per-call basis and probably some synchronization
>> between them.
>>
>> Also, we'd need to fix shadow page table handling with CPU updates, or
>> get rid of shadow page tables. I'm not sure why they're needed, TBH.
>
>
> The idea behind shadow page tables is that you have the current state of the
> pipeline if the GPU crashes. Since with CPU updates there is no pipeline
> they doesn't make sense at all with them.

They are also to protect against vram loss in the case of a GPU reset.

Alex

>
>
>>
>>>> +
>>>>        return 0;
>>>>      error_free_root:
>>>> @@ -2364,6 +2366,86 @@ int amdgpu_vm_init(struct amdgpu_device *adev,
>>>> struct amdgpu_vm *vm,
>>>>    }
>>>>      /**
>>>> + * amdgpu_vm_make_compute - Turn a GFX VM into a compute VM
>>>> + *
>>>> + * This only works on GFX VMs that don't have any BOs added and no
>>>> + * page tables allocated yet.
>>>> + *
>>>> + * Changes the following VM parameters:
>>>> + * - vm_context
>>>> + * - use_cpu_for_update
>>>> + * - pte_supports_ats
>>>> + * - pasid (old PASID is released, because compute manages its own
>>>> PASIDs)
>>>> + *
>>>> + * Reinitializes the page directory to reflect the changed ATS
>>>> + * setting. May leave behind an unused shadow BO for the page
>>>> + * directory when switching from SDMA updates to CPU updates.
>>>> + *
>>>> + * Returns 0 for success, -errno for errors.
>>>> + */
>>>> +int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct
>>>> amdgpu_vm *vm)
>>>> +{
>>>> +    bool pte_support_ats = (adev->asic_type == CHIP_RAVEN);
>>>> +    int r;
>>>> +
>>>> +    r = amdgpu_bo_reserve(vm->root.base.bo, true);
>>>> +    if (r)
>>>> +        return r;
>>>> +
>>>> +    /* Sanity checks */
>>>> +    if (vm->vm_context == AMDGPU_VM_CONTEXT_COMPUTE) {
>>>> +        /* Can happen if ioctl is interrupted by a signal after
>>>> +         * this function already completed. Just return success.
>>>> +         */
>>>> +        r = 0;
>>>> +        goto error;
>>>> +    }
>>>
>>> Ok, that is actually a show stopper. An interrupted IOCTL should never
>>> have a visible effect.
>>>
>>> Is that just a theoretical issue or did you run into this in practice?
>>
>> It's theoretical. init_kfd_vm in patch 4 reserves the page directory
>> again, validates it and waits for the validation to complete. Both the
>> reservation and the waiting can be interrupted by signals.
>
>
> No, both have parameters to control if they are interruptible or not for
> exactly this reason.
>
>>   This happens
>> after amdgpu_vm_make_compute has completed.
>>
>> If we wanted to achieve "no visible effect", we'd need to roll back the
>> conversion to a compute VM, which may involve more waiting.
>>
>> We have a similar issue in map_memory_to_gpu. We map on all GPUs and
>> wait for a sync object in the end. If SDMA is used, this allows the SDMA
>> on all GPUs to work concurrently. The wait in the end can be
>> interrupted. Instead of rolling back the mapping on all GPUs (which
>> would require more waiting), we just let the system call return
>> -ERESTARTSYS and make sure the restart doesn't cause any trouble.
>
>
> If waits are interruptible or not is controllable by parameters. At least
> for importing the VM it is probably ok to disable this.
>
>> I've been looking for documentation about the expected behaviour for
>> system calls interrupted by signals. All I can find is this statement in
>> Documentation/kernel-hacking/hacking.rst:
>>>
>>> After you slept you should check if a signal occurred: the Unix/Linux
>>> way of handling signals is to temporarily exit the system call with the
>>> ``-ERESTARTSYS`` error. The system call entry code will switch back to
>>> user context, process the signal handler and then your system call will
>>> be restarted (unless the user disabled that). So you should be prepared
>>> to process the restart, e.g. if you're in the middle of manipulating
>>> some data structure.
>>
>> The same paragraph is also copied in
>> https://www.kernel.org/doc/htmldocs/kernel-hacking/ioctls.html.
>>
>> The key sentence is "should be prepared to process the restart, e.g. if
>> you're in the middle of manipulating some data structure". Rolling back
>> everything would achieve that, but it's not the only way and not the
>> most efficient way.
>>
>> In this case, I'm handling the restart by checking whether the VM is
>> already a compute VM. So on the restart the conversion to a compute VM
>> becomes a no-op.
>>
>> Similarly, in the map_memory_to_gpu ioctl, mapping on a GPU where the
>> memory is already mapped becomes a no-op, which handles the restart
>> case. In this case we even added a specific test for this condition in
>> kfdtest, where we deliberately interrupt a big mapping operation with a
>> signal.
>
>
> This approach is a clear no-go.
>
> Changes done before interrupting a system call doesn't necessary need to be
> reverted, but only if they don't have a user visible effect.
>
> For example: When we map something we need to allocate multiple page
> directories. It is not necessary to release all previous allocated ones if
> an allocation suddenly says that it was interrupted, but only because those
> page directories are not user accessible.
>
> Similar exceptions can be done by avoiding things when the IOCTL was
> interrupted. E.g. in the VA IOCTL we usually directly submit updating the
> page tables, but if that is interrupted we just return success instead of
> restarting the IOCTL and wait with the update until the first command
> submission.
>
> The easiest way to work around this to to add a separate IOCTL which waits
> for VM updates to complete. Should be trivial to use vm->last_update for
> this.
>
> Regards,
> Christian.
>
>
>>
>> Regards,
>>    Felix
>>
>>> Regards,
>>> Christian.
>>>
>>>> +    if (!RB_EMPTY_ROOT(&vm->va.rb_root) || vm->root.entries) {
>>>> +        r = -EINVAL;
>>>> +        goto error;
>>>> +    }
>>>> +
>>>> +    /* Check if PD needs to be reinitialized and do it before
>>>> +     * changing any other state, in case it fails.
>>>> +     */
>>>> +    if (pte_support_ats != vm->pte_support_ats) {
>>>> +        /* TODO: This depends on a patch series by Christian.
>>>> +         * It's only needed for GFX9 GPUs, which aren't
>>>> +         * supported by upstream KFD yet.
>>>> +        r = amdgpu_vm_clear_bo(adev, vm, vm->root.base.bo,
>>>> +                   adev->vm_manager.root_level,
>>>> +                   pte_support_ats);
>>>> +        if (r)
>>>> +            goto error;
>>>> +        */
>>>> +    }
>>>> +
>>>> +    /* Update VM state */
>>>> +    vm->vm_context = AMDGPU_VM_CONTEXT_COMPUTE;
>>>> +    vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
>>>> +                    AMDGPU_VM_USE_CPU_FOR_COMPUTE);
>>>> +    vm->pte_support_ats = pte_support_ats;
>>>> +    DRM_DEBUG_DRIVER("VM update mode is %s\n",
>>>> +             vm->use_cpu_for_update ? "CPU" : "SDMA");
>>>> +    WARN_ONCE((vm->use_cpu_for_update & !amdgpu_vm_is_large_bar(adev)),
>>>> +          "CPU update of VM recommended only for large BAR system\n");
>>>> +
>>>> +    if (vm->pasid) {
>>>> +        unsigned long flags;
>>>> +
>>>> +        spin_lock_irqsave(&adev->vm_manager.pasid_lock, flags);
>>>> +        idr_remove(&adev->vm_manager.pasid_idr, vm->pasid);
>>>> +        spin_unlock_irqrestore(&adev->vm_manager.pasid_lock, flags);
>>>> +
>>>> +        vm->pasid = 0;
>>>> +    }
>>>> +
>>>> +error:
>>>> +    amdgpu_bo_unreserve(vm->root.base.bo);
>>>> +    return r;
>>>> +}
>>>> +
>>>> +/**
>>>>     * amdgpu_vm_free_levels - free PD/PT levels
>>>>     *
>>>>     * @adev: amdgpu device structure
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> index 40b4e09..7f50a38 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> @@ -263,6 +263,7 @@ void amdgpu_vm_manager_init(struct amdgpu_device
>>>> *adev);
>>>>    void amdgpu_vm_manager_fini(struct amdgpu_device *adev);
>>>>    int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>>               int vm_context, unsigned int pasid);
>>>> +int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct
>>>> amdgpu_vm *vm);
>>>>    void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm
>>>> *vm);
>>>>    bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
>>>>                      unsigned int pasid);
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx