[PATCH 0/8] shadow page table support V5 ---> shadow bo support

Wed Aug 17 14:20:43 UTC 2016

Yup, either way is fine as long as it's consistent.

Tom

________________________________
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of Christian König <deathsimple at vodafone.de>
Sent: Wednesday, August 17, 2016 10:16
To: StDenis, Tom; Zhou, David(ChunMing); amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH 0/8] shadow page table support V5 ---> shadow bo support

Errors should actually be reported by the caller, not here.

So we should probably remove that DRM_ERROR here as well.

Christian.

Am 17.08.2016 um 16:10 schrieb StDenis, Tom:

Why not be consistent and add a DRM_ERROR on all paths that include an error?  e.g. instead of

if (r)

   goto error_free;

Throw a DRM_ERROR("") in there.

Tom

________________________________
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org><mailto:amd-gfx-bounces at lists.freedesktop.org> on behalf of Christian König <deathsimple at vodafone.de><mailto:deathsimple at vodafone.de>
Sent: Wednesday, August 17, 2016 10:04
To: Zhou, David(ChunMing); amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH 0/8] shadow page table support V5 ---> shadow bo support

Patch #1:

Could be that we need to add another module parameter to control this,
but I think for now that should be sufficient.

Patch is Reviewed-by: Christian König <christian.koenig at amd.com><mailto:christian.koenig at amd.com>

Patch #2:

> +     if (direct_submit) {
> +             r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs,
> +                                    NULL, NULL, fence);
> +             job->fence = fence_get(*fence);
> +             if (r)
> +                     DRM_ERROR("Error scheduling IBs (%d)\n", r);
> +             amdgpu_job_free(job);

When there is an error you should return the code as well.

> +     } else {
> +             r = amdgpu_job_submit(job, ring, &adev->mman.entity,
> +                                   AMDGPU_FENCE_OWNER_UNDEFINED, fence);
> +             if (r)
> +                     goto error_free;
> +     }
>
>        return 0;

Just changing this to to "return r;" should be sufficient.

With that fixed the patch is Reviewed-by: Christian König
<christian.koenig at amd.com><mailto:christian.koenig at amd.com> as well.

Patch #3:

You mentioned that this isn't used during command submission but rather
during GPU reset, correct?

In this case I would advise not to use a member in the BO structure to
note the backup direction and instead have one function for back and one
for restoring the content (or note that as a parameter to the function).

Otherwise we could run into trouble when the CS wants to backup
something the GPU reset wants to restore at the same time.

Patch #4: Was already reviewed by me.

Patch #5:

> +     pt = params->shadow ? vm->page_tables[pt_idx].entry.robj->shadow :
> +             vm->page_tables[pt_idx].entry.robj;
You need to double check here if shadows are actually allocated or not.
Otherwise we will crash on an APU.

> +     /* double ndw, since need to update shadow pt bo as well */
> +     ndw *= 2;
> +
We don't need to double the IB size, but only the number of commands in
it (ncmds).

The difference is when we want to write scattered GART entries. In this
case I've added the PTEs to the end of the IB.

Patch #6:
> +     spinlock_t                      shadow_list_lock;
We might want to use a mutex here instead. Usually I would agree that a
spin lock is better for a linked list, but during the restore process in
a GPU reset we probably want to sleep here.

Alternatively you could splice the list to a local version on the stack,
grab references to the BOs and then drop the lock during the restore
process.

> @@ -541,6 +546,13 @@ void amdgpu_bo_unref(struct amdgpu_bo **bo)
>        if ((*bo) == NULL)
>                return;
>
> +     /* shadow must be freed before bo itself */
> +     if ((!(*bo)->shadow) && !list_empty(&(*bo)->shadow_list)) {
> +             spin_lock(&(*bo)->adev->shadow_list_lock);
> +             list_del_init(&(*bo)->shadow_list);
> +             spin_unlock(&(*bo)->adev->shadow_list_lock);
> +
> +     }
>        tbo = &((*bo)->tbo);
>        ttm_bo_unref(&tbo);
>        if (tbo == NULL)
That would trigger even when we take a temporary reference. I suggest to
add a amdgpu_bo_unref_shadow() function instead, unreferencing both the
shadow and the normal BO.

This can then be used in the VM code to cleanup the shadow created.

Going to skip patch #7 and #8 for now cause our team call begins in a
moment.

Regards,
Christian.

Am 17.08.2016 um 08:05 schrieb Chunming Zhou:
> Since we cannot ensure VRAM is consistent after a GPU reset, page
> table shadowing is necessary. Shadowed page tables are, in a sense, a
> method to recover the consistent state of the page tables before the
> reset occurred.
>
> We need to allocate GTT bo as the shadow of VRAM bo when creating page table,
> and make them the same. After gpu reset, we will need to use SDMA to copy GTT bo
> content to VRAM bo, then page table will be recoveried.
>
>
> V2:
> Shadow bo uses a shadow entity running on normal run queue, after gpu reset,
> we need to wait for all shadow jobs finished first, then recovery page table from shadow.
>
> V3:
> Addressed Christian comments for shadow bo part.
>
> V4:
> Switch back to update page table twice (one of two is for shadow)
>
> V5:
> make it be gerneic shadow bo support. Address Christian comments.
>
> Chunming Zhou (8):
>    drm/amdgpu: add need backup function V2
>    drm/amdgpu: add direct submision option for copy_buffer
>    drm/amdgpu: sync bo and shadow V2
>    drm/amdgpu: update pd shadow while updating pd
>    drm/amdgpu: update pt shadow while updating pt V2
>    drm/amdgpu: link all shadow bo
>    drm/amdgpu: implement recovery vram bo from shadow
>    drm/amdgpu: recover vram bo from shadow after gpu reset
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  9 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |  3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 39 ++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    | 94 ++++++++++++++++++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  9 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_test.c      |  4 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 21 ++++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 69 ++++++++++++++------
>   8 files changed, 215 insertions(+), 33 deletions(-)
>

_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
amd-gfx Info Page - lists.freedesktop.org<https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
lists.freedesktop.org
To see the collection of prior postings to the list, visit the amd-gfx Archives. Using amd-gfx: To post a message to all the list members, send email ...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160817/4932144e/attachment-0001.html>